Did he think Pokémon was a fierce landmark for him? A group of researchers argues that Super Mario Bros. It is even tougher.
Hao Ai Lab, a research body at the University of California San Diego, on Friday he threw him into Live Super Mario Bros games. CLAUDE 3.7 of the Anthropiku did the best, followed by Claude 3.5. Google’s Gemini 1.5 Pro and GPT-4o of OpenAi fought.
It was not the same version of Super Mario Bros. As the original 1985 edition, to be clear. The game took place in an emulator and was integrated with a frame, gamingagent, to give AIS control over Mario.
Gamingagent, which Hao developed at home, fed the basic instructions of him, like, “If an obstacle or enemy is near, move/jump left to avoid” and pictures in the game. He then generated inputs in the form of Python code to control Mario.
However, Hao says the game forced each model to “learn” to plan complex maneuvers and develop game strategies. Interestingly, the laboratory found that reasoning patterns like O1’s O1, which “think” through step-by-step problems to reach solutions, performed worse than “non-reasoning” models, despite being generally stronger in most standards.
One of the main reasons that reasoning patterns have problems playing real-time games like this is that they take a time-sect, usually decide for actions, according to researchers. In Super Mario Bros., Time is everything. One second can mean the difference between a safely cleaned jump and a decrease in your death.
Games have been used to compare it for decades. But some experts have questioned the wisdom of drawing links between the games of it and technological progress. Unlike the real world, games tend to be abstract and relatively simple, and they offer a theoretically endless amount of data to train it.
The latest standards of glowing games show what Andrej Karpathy, a research scientist and founding member of Openai, called it a “evaluation crisis”.
“I really don’t know what (he) metric I look at now,” he wrote in a post on X. “Tldr my reaction is that I don’t know how good these models are now.”
At least we can see him playing Mario.