Debates over comparison he has reached Pokemon

Nor is Pokémon safe from the controversy of Benchmarking he.

Last week, a post on X went viral, claiming that the latest Google model Gemini exceeded the Claude model of the anthropic flag in the original Pokémon video games. The Gemini is reported to have reached the city of Lavender in the course of a developer of a developer; Claude was stuck in Mount Moon since the end of February.

Gemini is literally in front of the CLAUDE ATM in Pokemon after reaching Lavender City

119 LIVE CLOTH ONLY BTW, General independent risk Pic.twitter.com/8avsovi4x

– you (@you21e8) 10 April 2025

But what the post failed to mention is that the Gemini had an advantage.

As users in Reddit pointed out, the developer who holds the twin flow built a custom minimum that helps the model identify “tiles” in the game as cut trees. This reduces the need for Gemini to analyze screen footage before making game decisions.

Now, Pokémon is a semi-serious reference point, and at best-they will argue that it is a very informative test of a model skills. But it IS A guide example of how different applications of a landmark can affect the results.

For example, Anthropic reported two results for its latest anthropic model 3.7 sonnet in the verified SWE-Bench Benchmark, which is designed to evaluate the coding skills of a model. Claude 3.7 Sonnet reached 62.3% accuracy in verified SWE-Bench, but 70.3% with a “custom scaffolding” that developed anthropic.

Recently, Meta arranged a version of one of his newest models, Llama 4 Maverick, to perform well in a special landmark, LM Arena. The model’s vanilla version marks significantly worse in the same rating.

Given that the standards of the AI-operated are included-are imperfect measures to begin, custom and non-standard applications threaten to muddy waters further. That is, it does not appear to be easier to make it easier to compare the models after they are released.

What's Hot

Zhao Xintong: Who is the first Snooker champion in China in the world and why was it banned? | Snoooker news

Netflix just added this classic film – and is 84 percent in rotten tomatoes

UK and India trade agreement after three years of talks

Debates over comparison he has reached Pokemon

Promoted by defense and Starlink, ORCA he withdraws to $ 72.5m for its autonomous transport platform

Blinq lands $ 25 million to further his mission to make business cards pass

People struggle to get beneficial health advice from chatbots, findings of study

Employer.com scoops up to a fintech in the purchase of mainstreet.com

Waymo rampos robot production at the new Arizona factory

FTC stops hidden tariffs for direct events and short -term, effective May 12

Apple Martin in stealing her mother’s clothes Gwyneth Paltrow and more

Chelsea latest: Contact made with Bayern Munich on availability Tel

18 Rich Mom Spring Dresses to channel European departure style

Trump released centuries -old war measures in his campaign against “invasion” by the Venezuelan gang

Pope Francis blesses the crowd, leaves the hospital after fighting pneumonia for 5 weeks

Man City 2 – 2 Brighton

Zhao Xintong: Who is the first Snooker champion in China in the world and why was it banned? | Snoooker news

Netflix just added this classic film – and is 84 percent in rotten tomatoes

UK and India trade agreement after three years of talks

Promoted by defense and Starlink, ORCA he withdraws to $ 72.5m for its autonomous transport platform

Our Picks

Zhao Xintong: Who is the first Snooker champion in China in the world and why was it banned? | Snoooker news

Netflix just added this classic film – and is 84 percent in rotten tomatoes

UK and India trade agreement after three years of talks

Most Popular

Morgan Stanley Cedes Chief Goldman Sachs Rival

Steven Crueger of Yellowjackets excites the big responses that fans won’t see to come

VP JD Vance and his new family begin their life in the official residence

What's Hot

Subscribe to Updates

Debates over comparison he has reached Pokemon

Keep Reading