Will Smith eating spaghetti and other weird AI benchmarks set in 2024

When a company releases a new AI video generator, it’s not long before someone uses it to make a video of actor Will Smith eating spaghetti.

It’s become something of a meme, as well as a benchmark: See if a new video generator can actually make Smith slide down a bowl of noodles. Smith himself parodied the trend in an Instagram post in February.

Google Veo 2 has done that.

We are finally eating spaghetti. pic.twitter.com/AZO81w8JC0

— Jerrod Lew (@jerrod_lew) December 17, 2024

Will Smith and Pasta is just one of several “unofficial” weird benchmarks to grip the AI community in 2024. A 16-year-old developer built an app that gives AI control over Minecraft and tests its ability to design structures . Elsewhere, a British programmer created a platform where AI plays games like Pictionary and Connect 4 against each other.

It’s not like there aren’t more academic tests of an AI’s performance. So why did the weirdest explode?

Image credits:Paul Calcraft

For one, many of the AI industry benchmarks don’t tell the average person much. Companies often cite their AI’s ability to answer questions on Math Olympiad exams, or find reliable solutions to Ph.D-level problems. However, most people – including yours truly – use chatbots for things like answering emails and basic research.

Crowdsourced industry measures are not necessarily better or more informative.

Take, for example, Chatbot Arena, a public benchmark that many AI enthusiasts and developers follow obsessively. Chatbot Arena allows anyone on the web to rate how well AI performs on specific tasks, such as creating a web application or generating an image. But raters tend to be unrepresentative — most come from AI and tech industry circles — and cast their votes based on personal, hard-to-determine preferences.

Chatbot Arena interface.Image credits:LMSYS

Ethan Mollick, a management professor at Wharton, recently pointed out in a post on X another problem with many AI industry standards: they don’t compare a system’s performance to that of an average person.

“The fact that there aren’t 30 different standards from different organizations in medicine, law, quality of advice, and so on is a real shame, since people are using systems for these things regardless,” Mollick wrote.

Weird AI standards like Connect 4, Minecraft and Will Smith eating spaghetti certainly are. NO empirical – or even all generalizable. Just because an AI takes the Will Smith test, doesn’t mean it will generate, say, a burger pit.

Mcbench — Note the typo; there is no model like the Claude 3.6 Sonnet.Image credits:Adonis Singh

One expert I spoke to about AI standards suggested that the AI community focus on the downstream impacts of AI rather than its ability in narrow areas. This is reasonable. But I have a feeling that weird standards aren’t going away anytime soon. Not only are they fun – who doesn’t love watching AI build Minecraft castles? – but they are easy to understand. And as my colleague Max Zeff wrote recently, the industry continues to struggle with distilling a technology as complex as AI into digestible marketing.

The only question on my mind is, what weird new standards will go viral in 2025?

What's Hot

Strictlyvc in Athens will display the Prime Minister of Greece

The Detroit woman confronted false turn to the US Canada Bridge with the deportation

DP Tour World: Marco Penge Penge claims the girls’ title two months after betting | Golf

Will Smith eating spaghetti and other weird AI benchmarks set in 2024

Strictlyvc in Athens will display the Prime Minister of Greece

Instagram edits led 7m downloads in the first week, a larger starter than that of CAPCUT

Recently the new Gamified app helps people get in time

Government officials are a bad type on the Internet

Realreal founder Julie Wainwright has a startling new memory

The great sale of Amazon’s books just happens to overlap with the Independent Bookstore Day

Trump proposes ‘cleaning’ of Gaza’s population

Jemens Houthi fighter has dropped US drones in less than six weeks worth 200 million US dollars | Houthis News

Elon Musk Jumps to Help After Cybertruck Blast, Says Suspect ‘Choosed the Wrong Vehicle’

Scottish Prime Minister: HIBS Hearts Hearts, Celtic Visit St Mirren in Sky and Boss Rangers Barry Ferguson makes the return of Ibrox | Football news

US Supreme Court Criticizes TikTok Arguments Against Threatening Ban | Social media messages

Neko, the body-scanning startup co-founded by Spotify’s Daniel Ek, raises $260 million at a $1.8 billion valuation

Strictlyvc in Athens will display the Prime Minister of Greece

The Detroit woman confronted false turn to the US Canada Bridge with the deportation

DP Tour World: Marco Penge Penge claims the girls’ title two months after betting | Golf

COX Laverne in Cancellation of Clean Tile: ‘A Cambling In My Heart’

Our Picks

Strictlyvc in Athens will display the Prime Minister of Greece

The Detroit woman confronted false turn to the US Canada Bridge with the deportation

DP Tour World: Marco Penge Penge claims the girls’ title two months after betting | Golf

Most Popular

Morgan Stanley Cedes Chief Goldman Sachs Rival

Steven Crueger of Yellowjackets excites the big responses that fans won’t see to come

VP JD Vance and his new family begin their life in the official residence

What's Hot

Subscribe to Updates

Will Smith eating spaghetti and other weird AI benchmarks set in 2024

Keep Reading