DeepSeek claims its reasoning model outperforms OpenAI's o1 in several benchmarks

Chinese artificial intelligence lab DeepSeek has released an open source version of DeepSeek-R1, its so-called reasoning model, that it claims performs as well as OpenAI’s o1 on several AI benchmarks.

R1 is available from the Hugging Face AI developer platform under an MIT license, meaning it can be used commercially without restrictions. According to DeepSeek, R1 beats o1 in AIME, MATH-500 and SWE-bench Verified benchmarks. AIME uses other models to evaluate a model’s performance, while MATH-500 is a collection of word problems. SWE-bench Verified, meanwhile, focuses on programming tasks.

Being a reasoning model, the R1 effectively controls itself, which helps it avoid some of the pitfalls that usually beset models. Reasoning patterns take slightly longer—usually seconds to minutes longer—to reach a solution than a typical non-reasoning pattern. The good thing is that they tend to be more reliable in areas such as physics, science and math.

R1 contains 671 billion parameters, DeepSeek revealed in a technical report. Parameters roughly correspond to a model’s problem-solving abilities, and models with more parameters generally perform better than those with fewer parameters.

671 billion parameters is massive, but DeepSeek also released “distilled” versions of R1 that range in size from 1.5 billion parameters to 70 billion parameters. The smallest can run on a laptop. As for the full R1, it requires stronger hardware, but that IS available through DeepSeek’s API at prices 90%-95% cheaper than OpenAI’s o1.

R1 has a downside. Being a Chinese model, it is subject to benchmarking by China’s internet regulator to ensure its responses “embody core socialist values”. R1 will not answer questions about Tiananmen Square, for example, or Taiwan’s autonomy.

R1 filtering in action. Image credits:DeepSeek

Many Chinese AI systems, including other reasoning models, refuse to respond to topics that could raise the ire of regulators at home, such as speculation about the Xi Jinping regime.

R1 comes days after the outgoing Biden administration proposed tougher export rules and restrictions on AI technologies for Chinese enterprises. Companies in China were already barred from buying advanced AI chips, but if the new rules go into effect as written, companies will face tighter restrictions on both the semiconductor technology and the designs needed to enable sophisticated AI systems.

In a policy document last week, OpenAI urged the US government to support US AI development, lest Chinese models match or surpass them in capability. In an interview with The Information, OpenAI’s vice president of policy, Chris Lehane, singled out High Flyer Capital Management, DeepSeek’s corporate parent, as an organization of particular concern.

So far, at least three Chinese labs — DeepSeek, Alibaba and Kimi, which is owned by Chinese unicorn Moonshot AI — have produced models they claim rival o1. (For the record, DeepSeek was first—it announced a preview of the R1 in late November.) In a post on X, Dean Ball, an AI researcher at George Mason University, said the trend suggests that Chinese labs They will continue to be “fast followers.”

“The impressive performance of DeepSeek’s distilled models (…) means that highly skilled reasoners will continue to be widespread and able to run on local hardware,” Ball wrote, “away from the eyes of any top-down control regime “.

What's Hot

Hallmark movies are leaving Palua – where are they broadcasting now?

UK food inflation hit the top 11 months in April, industry records show

Mysterious financier asks the judge to stop the sale of canoo assets

DeepSeek claims its reasoning model outperforms OpenAI’s o1 in several benchmarks

Mysterious financier asks the judge to stop the sale of canoo assets

Fintech in the UK Prike Control closes $ 7.3 million round to facilitate mortgage overpayments

Save $ 210 + 50% discount from a second ticket to sessions: he by May 4

Early start of detecting Craif cancer collects $ 22 million

Omniretail shakes African B2B e -commerce market with $ 20 million series

Strictlyvc goes to Athens and London in May to chat with Tech Europe

What is at stake for American culture, with Trump’s Kennedy Center changes

Meet the Chinese hackers “Typhoon” preparing for war

Kyle Richards swear for this serum of $ 20 vitamin C

The Palestinian activist who was arrested by ICE while expecting the US citizenship interview

Sabrina Carpenter debuts tartan corsets in ‘short n’ sweet tour ‘

THREE penalties and a non-red card! Fulham and Ipswich draw thriller

Hallmark movies are leaving Palua – where are they broadcasting now?

UK food inflation hit the top 11 months in April, industry records show

Mysterious financier asks the judge to stop the sale of canoo assets

Autopsy confirms that Gene Hackman has died of heart disease, his Alzheimer’s and longer fasting states

Our Picks

Hallmark movies are leaving Palua – where are they broadcasting now?

UK food inflation hit the top 11 months in April, industry records show

Mysterious financier asks the judge to stop the sale of canoo assets

Most Popular

Morgan Stanley Cedes Chief Goldman Sachs Rival

Steven Crueger of Yellowjackets excites the big responses that fans won’t see to come

VP JD Vance and his new family begin their life in the official residence

What's Hot

Subscribe to Updates

DeepSeek claims its reasoning model outperforms OpenAI’s o1 in several benchmarks

Keep Reading