Close Menu
Trends Today
  • Home
  • News
  • Business
  • Health
  • Sports
  • Tech
  • Lifestyle
  • Entertainment
What's Hot

Anthropic, Google Score wins from Nabbing Harvey backed by Openai as user

May 13, 2025

Iraq frees over 19,000 prisoners under a new amnesty, including some ex-isil | ISIL/ISIS News

May 13, 2025

PGA Championship 2025 Times Times: Complete groups and Starting in the UK for the first round in Quail Hollow | Golf

May 13, 2025

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

Facebook X (Twitter) Instagram
Trending
  • Anthropic, Google Score wins from Nabbing Harvey backed by Openai as user
  • Iraq frees over 19,000 prisoners under a new amnesty, including some ex-isil | ISIL/ISIS News
  • PGA Championship 2025 Times Times: Complete groups and Starting in the UK for the first round in Quail Hollow | Golf
  • US sanctions companies allegedly sending Iranian oil to China
  • Up.Labs-surnsche-new starting new starting wants to be the automobile retail tile
  • “I wanted to die”: Kim Kardashian shares fears when she is suspected in court
  • Liverpool transfer news: jeremie frimong happy to move to Anfield If Reds decide to sign right back | Football news
  • Kim Kardashian robbery in Paris: Stylist details $ 10 million jewelry
Tuesday, May 13
Facebook X (Twitter) Instagram
Trends TodayTrends Today
  • Home
  • News
  • Business
  • Health
  • Sports
  • Tech
  • Lifestyle
  • Entertainment
Trends Today
Home»Tech

OpenAI's O3 suggests that AI models are growing in new ways – but so are costs

Editor TeamBy Editor TeamDecember 25, 2024 Tech No Comments7 Mins Read
OpenAI CEO Sam Altman
Share
Facebook Twitter LinkedIn Pinterest Email


Last month, AI founders and investors told TechCrunch that we're now in the “second age of scaling laws,” noting how established methods of improving AI models were showing diminishing returns. A promising new method they suggested could hold benefits was “test-time scaling,” which appears to be behind the performance of OpenAI's o3 model—but it comes with its own drawbacks.

Much of the AI ​​world took the announcement of OpenAI's o3 model as proof that progress in scaling AI hasn't “hit a wall.” The o3 model does well on benchmarks, significantly outperforming all other models on a test of general ability called ARC-AGI and scoring 25% on a difficult math test, in which no other AI model scored higher more than 2%.

Of course, we at TechCrunch are taking all of this with a grain of salt until we can test o3 ourselves (very few have so far). But even before the release of o3, the AI ​​world is already convinced that something big has changed.

Co-creator of OpenAI's o series of models, Noam Brown, noted Friday that the startup is announcing o3's impressive earnings just three months after the startup announced o1 — a relatively short time frame for such growth in performance.

“We have every reason to believe this trajectory will continue,” Brown said in a tweet.

Anthropic co-founder Jack Clark said in a blog post on Monday that o3 is proof that AI “progress will be faster in 2025 than in 2024.” (Note that it takes advantage of Anthropic — specifically its ability to raise capital — to suggest that the laws of AI scaling are continuing, even if Clark is filling in for a competitor.)

In the coming year, Clark says the AI ​​world will merge test-time scaling and traditional pre-training scaling methods together to provide even greater returns from AI models. Perhaps he's suggesting that Anthropic and other AI model providers will release their own reasoning models in 2025, just as Google did last week.

Test time scaling means that OpenAI is using more computation during the inference phase of ChatGPT, the period of time after you press enter on a request. It's not clear exactly what's going on behind the scenes: OpenAI is either using more computer chips to answer a user's question, using more powerful inference chips, or using those chips for longer periods of time – 10 to 15 minutes in some cases – before the AI ​​produces an answer. We don't know all the details of how o3 was created, but these benchmarks are early signs that scaling test time can work to improve the performance of AI models.

While o3 may give some new faith in the advancement of AI's scaling laws, OpenAI's newest model also uses a never-before-seen level of computation, which means a higher price per answer.

“Perhaps the only important caveat here is realizing that one reason O3 is so much better is that it costs more money to run at runtime—the ability to use computational tools at test time for some problems that you can turn the calculation into a better answer,” Clark writes in his blog. the service of a generative model only by looking at the model and the cost to generate a certain result.”

Clark, and others, pointed to o3's performance on the ARC-AGI benchmark—a difficult test used to assess advances in AGI—as an indicator of its progress. It is worth noting that passing this test, according to its creators, does not mean an AI model has reached AGI, but rather it is a way to measure progress towards a nebulous goal. That said, the o3 model outperformed all previous AI models that took the test, scoring 88% on one of its attempts. OpenAI's next best AI model, o1, scored only 32%.

Graph showing the performance of OpenAI's o-series in the ARC-AGI test.Image credits:ARC Award

But the logarithmic x-axis in this table may be alarming to some. The high-scoring version of o3 used more than $1,000 worth of calculations for each task. The o1 models used about $5 of computing per task, and the o1-mini used just a few cents.

The creator of the ARC-AGI benchmark, François Chollet, writes in a blog that OpenAI used approximately 170 times more computation to generate that 88% result, compared to the high-efficiency version of o3 that scored just 12% lower. The high-scoring version of o3 used more than $10,000 in resources to complete the test, which makes it very expensive to compete for the ARC Prize – an unbeatable competition for AI models to beat the ARC test.

However, Chollet says the o3 was still a breakthrough for AI models.

“o3 is a system capable of adapting to tasks it has never encountered before, probably approaching human-level performance in the ARC-AGI domain,” Chollet said in the blog. “Of course, such generalization comes at a huge cost, and it still wouldn't be quite economical: You could pay a human to solve ARC-AGI tasks for roughly $5 per task (we know, we did it ), while consuming only cents. in energy.”

It's too early to speculate on the exact price of all this – we've seen prices for AI models drop significantly in the past year, and OpenAI has yet to announce what the o3 will actually cost. However, these prices show how much computation is required to break, even slightly, the performance barriers imposed by today's leading AI models.

This raises several questions. What is o3 really for? And how much more computation is needed to make more gains around ending up with o4, o5, or whatever OpenAI names its future reasoning models?

It doesn't seem like o3, or its successors, would be anyone's “daily driver” like GPT-4o or Google Search might be. These models simply use a lot of calculations to answer little questions throughout your day, like, “How can the Cleveland Browns still make the 2024 playoffs?”

Instead, it seems like AI models with scalable time-trial calculations may only be good for big-picture requests like, “How can the Cleveland Browns become a Super Bowl franchise in 2027 ?” Even then, it's probably only worth the high computational costs if you're the general manager of the Cleveland Browns and you're using these tools to make some big decisions.

Institutions with deep pockets may be the only ones who can afford o3, at least to begin with, as Wharton professor Ethan Mollick notes in a tweet.

The O3 seems too expensive for most uses. But for work in academia, finance, and many industrial problems, paying hundreds or even thousands of dollars for a successful answer would not be prohibitive. If it is generally reliable, o3 will have multiple use cases even before the costs are reduced

— Ethan Mollick (@emollick) December 22, 2024

We've already seen OpenAI release a $200 tier to use a high-computing version of o1, but the startup is said to be considering creating subscription plans that cost up to $2,000. When you see how much the o3 calculation uses, you can understand why OpenAI would consider it.

But there are drawbacks to using o3 for high-impact work. As Chollet notes, o3 is not AGI, and it still fails at some very easy tasks that a human would do very easily.

This is not necessarily surprising, as large language models still have a major hallucinatory problem, which o3 and the test time calculation do not seem to have solved. That's why ChatGPT and Gemini include disclaimers below every answer they produce, asking users not to trust answers at face value. Apparently AGI, if ever achieved, would need no such disclaimer.

One way to unlock more gains in test time scaling could be better AI inference chips. There is no shortage of startups that address just this, such as Groq or Cerebras, while other startups are designing more cost-effective AI chips, such as MatX. Andreessen Horowitz general partner Anjney Midha previously told TechCrunch that he expects these startups to play a bigger role in scaling test time moving forward.

While o3 is a significant improvement in the performance of AI models, it raises some new questions about usage and costs. That said, o3's performance adds credence to the claim that test-time computing is the tech industry's next best way to scale AI models.

TechCrunch has an AI-focused newsletter! Register here to receive it in your inbox every Wednesday.



Editor Team
  • Website

Keep Reading

Anthropic, Google Score wins from Nabbing Harvey backed by Openai as user

Up.Labs-surnsche-new starting new starting wants to be the automobile retail tile

The notion undertakes the notes as granolas with its own transcription feature

Realta Fusion Taps $ 36 million in fresh funds for his melting reactor in a bottle

Microsoft Build 2025: To what should we wait, from Azure to Copilot Updates

Alltrails debuts $ 80/year membership that includes smart ways with him

Add A Comment

Comments are closed.

Top Posts

Nigel Farage’s reform in the UK overcomes work in the new opinion survey

February 4, 20252 Views

Justin Theroux and Nicole Brydon Bloom Timeline of Relationship

March 17, 20251 Views

Pipelines seem to be more popular in the middle of Trump’s threats. But does it make sense to build new ones?

February 28, 20251 Views

Putin explains Easter armistice; Zelenskyy says that Russian bombing “did not subside”

April 19, 20252 Views

Chinese AI company MiniMax releases new models it claims are competitive with the best in the industry

January 16, 20252 Views

Ben Johnson: Chicago Bears confirm Detroit Lions offensive coordinator as new coach | NFL News

January 21, 20252 Views

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

Don't Miss

Anthropic, Google Score wins from Nabbing Harvey backed by Openai as user

Tech May 13, 2025

He Tool He Popular Legal Harvey will now use the main models of the Foundation…

Iraq frees over 19,000 prisoners under a new amnesty, including some ex-isil | ISIL/ISIS News

May 13, 2025

PGA Championship 2025 Times Times: Complete groups and Starting in the UK for the first round in Quail Hollow | Golf

May 13, 2025

US sanctions companies allegedly sending Iranian oil to China

May 13, 2025
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
About Trends Today
About Trends Today

Stay informed with the latest news, trending stories, and in-depth analysis, brought to you with accuracy, integrity, and a focus on what matters most.

Facebook X (Twitter) Pinterest
Our Picks

Anthropic, Google Score wins from Nabbing Harvey backed by Openai as user

May 13, 2025

Iraq frees over 19,000 prisoners under a new amnesty, including some ex-isil | ISIL/ISIS News

May 13, 2025

PGA Championship 2025 Times Times: Complete groups and Starting in the UK for the first round in Quail Hollow | Golf

May 13, 2025
Most Popular

Morgan Stanley Cedes Chief Goldman Sachs Rival

February 9, 2025447 Views

Steven Crueger of Yellowjackets excites the big responses that fans won’t see to come

February 14, 2025166 Views

VP JD Vance and his new family begin their life in the official residence

January 25, 202585 Views
Facebook X (Twitter) Instagram Pinterest
  • Home
  • Privacy Policy
  • Contact Us

© 2025 Trends Today. All Rights Reserved.
Developed By RELANCER LTD

Type above and press Enter to search. Press Esc to cancel.

Ad Blocker Enabled!
Ad Blocker Enabled!
Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.