AI benchmarks are usually abstract. They test math puzzles, programming problems, or reading comprehension tasks most people never encounter in real life. But the newest yardstick for long-term AI performance is surprisingly ordinary: running a vending machine. The benchmark, called Vending-Bench, was created by Andon Labs to test whether AI agents can handle one of their […] The post Grok 4 beats GPT-5 In Running Business According to Latest Vending-Bench appeared first on Fello AI.