A new, challenger of the AGI test most of the models of he

Arc Arcimo Foundation, a nonprofit co-founded by prominent scholar François Chollet, announced in a blog post on Monday that he has created a new, challenging test to measure the general intelligence of the main models of him.

So far, the new test, called Arc-Agi-2, has prevented most models.

“Reasoning” of it like O1-Pro Openai and Deepseek’s R1 score between 1% and 1.3% in ARC-AGI-2, according to the ARC pricing manager. Powerful non-reasonable models, including GPT-4.5, Claude 3.7 sonnet, and Gemini 2.0 Flash the score about 1%.

ARC-AGI tests consist of enigma-like problems, where one it has to identify visual patterns from a collection of different colored squares, and generate the correct “answer” network. Problems were created to force a one to adapt to the new problems he has not seen before.

The ARC Price Foundation had over 400 people to get the ARC-Agi-2 to create a human base. On average, these people’s “panels” received 60% of the test questions – much better than any of the models results.

A sample question from ARC-AGI-2 (Credit: Arc Arc).

In a post on X, Chollet claimed that the ARC-Agi-2 is a better measure of the current intelligence of a model than the first repetition of the test, the ARC-Agi-1. Arc Arc Foundation tests are intended to evaluate whether a system can effectively acquire new skills outside the data for which he was trained.

Chollet said that unlike the ARC-Agi-1, the new test prevents the models of it from relying on “brutal strength”-wide computing-find solutions. Chollet previously admitted that this was a big Arc-Agi-1 flaw.

To address the flaws of the first test, the ARC-Agi-2 introduces a new metric: efficiency. It also requires models to interpret the models in flight instead of relying on memorization.

“Intelligence is not only determined by the ability to solve problems or achieve high results,” wrote ARC Foundation co -founder Greg Kamradt in a blog post. “The efficiency with which those skills are acquired and placed is an essential, determining ingredient. The essential question that is raised is not alone,” Can that ability to solve a task? “But also, ‘with what efficiency or cost?’ “

The Arc-Agi-1 was invincible for nearly five years until December 2024, when Openai issued his advanced model of reasoning, O3, which exceeded all other models of him and matched human performance in evaluation. However, as we have noticed at the time, the O3 performance wins at the ARC-Agi-1 came with a huge price.

Openai-o3 (low) (low) version of the O3 model-which was first to reach new heights in the ARC-Agi-1, marking 75.7% in the test, received one at 4% speed in the ARC-AGI-2 using $ 200 with information power for the task.

Comparison of the Frontier model performance at ARC-AGI-1 and ARC-Agi-2 (Credit: Arc Arc).

The arrival of ARC-Agi-2 comes as much as possible in the technology industry are calling for new, unsaturated standards to measure it. Hugging Face’s co -founder, Thomas Wolf, recently told Techcrunch that he lacks enough tests to measure the main features of the so -called general artificial intelligence, including creativity.

In addition to the new standard, the ARC Imim Foundation announced a new ARC 2025 price competition, challenging developers to reach 85% accuracy in the ARC-AGI-2 test while spending only 0.42 dollars per task.

What's Hot

Apple finally begins the next generation ‘Carplay Ultra’ software, starting with Aston Martin

Doctors say

Savannah marshall on the course for Claresa Shields Rematch? “They’re on top,” says coach Peter Fury | Boxing news

A new, challenger of the AGI test most of the models of he

Apple finally begins the next generation ‘Carplay Ultra’ software, starting with Aston Martin

South Korea delays the decision to allow Google to move Hi-Ra map data overseas

Grok is telling without compromising X users for the ‘white genocide’ of South Africa

YouTube introduces an interactive product food for purchased TV ads

YouTube aims dollars TV with NFL deal, ‘display’ by creators

Luminar billionaire founder replaced as CEO after investigating ethics

Women’s Darts Series: NOAA-LOOK VAN Leuven provides back-back titles | The arrow news

Nervous googling: search for “recession” and “tariff” as a US voting acid

Xai i Elon Musk buy x

Northvolt explains bankruptcy in Sweden and raises questions about the future of the massive battery system in Quebec

Trump calls his tariffs economic “medicine” after the trillions were wiped out by the US stock market

Australian Open: Jannik Sinner takes on Alexander Zverev in men’s singles final in Melbourne on Sunday | Tennis news

Apple finally begins the next generation ‘Carplay Ultra’ software, starting with Aston Martin

Doctors say

Savannah marshall on the course for Claresa Shields Rematch? “They’re on top,” says coach Peter Fury | Boxing news

Scarlett Johansson Dodges Colin Jost Kiss in Snl final promo

Our Picks

Apple finally begins the next generation ‘Carplay Ultra’ software, starting with Aston Martin

Doctors say

Savannah marshall on the course for Claresa Shields Rematch? “They’re on top,” says coach Peter Fury | Boxing news

Most Popular

Morgan Stanley Cedes Chief Goldman Sachs Rival

Steven Crueger of Yellowjackets excites the big responses that fans won’t see to come

VP JD Vance and his new family begin their life in the official residence

What's Hot

Subscribe to Updates

A new, challenger of the AGI test most of the models of he

Keep Reading