Close Menu
Trends Today
  • Home
  • News
  • Business
  • Health
  • Sports
  • Tech
  • Lifestyle
  • Entertainment
What's Hot

The notion undertakes the notes as granolas with its own transcription feature

May 13, 2025

“We deserve it much better”

May 13, 2025

Danny Care: Former Scrum Half in England to retire from the Rugby Union at the end of the season | Rugby Union News

May 13, 2025

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

Facebook X (Twitter) Instagram
Trending
  • The notion undertakes the notes as granolas with its own transcription feature
  • “We deserve it much better”
  • Danny Care: Former Scrum Half in England to retire from the Rugby Union at the end of the season | Rugby Union News
  • American inflation falls to 2.3% in April as the tariff effect approaches
  • Realta Fusion Taps $ 36 million in fresh funds for his melting reactor in a bottle
  • Trump starts the Middle East to visit large economic offers
  • England Vs West Indies: Liam Livingstone fell from white balls as Liam Dawson returns | Cricket news
  • Ryan Reynolds gives the first night interview late since the lawsuit
Tuesday, May 13
Facebook X (Twitter) Instagram
Trends TodayTrends Today
  • Home
  • News
  • Business
  • Health
  • Sports
  • Tech
  • Lifestyle
  • Entertainment
Trends Today
Home»Tech

These researchers used Sunday NPR enigma questions to compare ai reasoning patterns

Editor TeamBy Editor TeamFebruary 17, 2025 Tech No Comments4 Mins Read
Robot humanoid uses laptop
Share
Facebook Twitter LinkedIn Pinterest Email


Sundaydo on Sunday, the host of NPR Will Shortz, the Guru of the Crossword of New York Times, takes to call thousands of listeners in a long segment called Sunday’s enigma. While it is written to be solvable without LOT Many prefaces, braintesers are usually challenging for skilled competitors as well.

That is why some experts think they are a promising way to prove the limits of his problem solving.

In a recent study, a team of researchers greeted by Wellesley College, Oberlin College, University of Texas in Austin, Northeast University, Charles University, and Startup Cursor created a standard using Riddles from Episodes of Sunday. The team says their test revealed startling knowledge, like those reasoning patterns – O1 O1, among other things – sometimes “give up” and give answers that they know they are not correct.

“We wanted to develop a landmark with problems that people can only understand with general knowledge,” Techcrunch Arjun Guha, a member of the Faculty of Computer Science in the northeast and one of the co -authors in the study, told Techcrunch.

The industry of it is in a strange bit at the moment. Most of the tests commonly used to evaluate the models of it for skills, such as competence in mathematics and science questions at the doctoral level, which are not important to the average user. Meanwhile, many standards – even standards released relatively recently – are quickly approaching the saturation point.

The advantages of a Radio Radio Public Game as Sunday’s puzzle is that it does not test for esoteric knowledge, and the challenges are expressed that models cannot attract the “Role memory” to solve them, explained Guha.

“I think what makes these problems difficult is that it is really difficult to make significant progress for a problem until you solve it – this is when everything clicks together all right away,” Guha said. “This requires a combination of penetration and an elimination process.”

No reference point is perfect, of course. Sunday’s puzzle is only in the center of the US and English. And because the quizzes are available to the public, it is possible that the models trained for them can “cheat” in a sense, though Guha says he has not seen evidence of it.

“New questions are released every week, and we can expect the latest questions to be really invisible,” he added. “We aim to keep the standard fresh and follow how the model performance changes over time.”

In the standard of researchers, which consists of about 600 Sunday puzzles, patterns of reasoning such as O1 and R1 of Deepseek further exceeds the rest. Reasoning patterns thoroughly control the facts themselves before yielding results, which helps them avoid some of the traps that normally travel to it. The trade is that reasoning patterns take a little more time to reach a solution-extremely seconds to minutes longer.

At least one model, the R1 of Deepseek, gives a solution that knows how to be wrong for some of Sunday’s puzzle questions. R1 will literally declare “I give up”, followed by an incorrectly selected inaccurate response at random – the behavior with which man can certainly be linked.

Models make other strange choices, like giving a wrong answer just to attract it immediately, attempt to harass a better and fail again. They also grip “thinking” forever and give nonsense explanations for response, or arrive immediately in an accurate answer, but then continue to consider alternative responses for no clear reason.

“For difficult problems, r1 literally says it is being disappointed,” Guha said. “It was funny to see how a model imitates what a man can say. It remains to be seen as ‘disappointment’ in reasoning can affect the quality of the model results. ”

R1 being “frustrated” for a question in the Puzzle challenge group.Picture loans:Guha et al.

The actual model with the best performance in the landmark is O1 with a 59%score, followed by the recent O3-Line in “High Reasoning” (47%). (R1 marked 35%.) As another step, researchers plan to expand their testing into additional reasoning patterns, which they hope will help identify areas where these models can be improved.

Npr Benchmark
The results of the models that the team tested to their standard.Picture loans:Guha et al.

“You do not need a doctorate to be good in reasoning, so it should be possible to develop standards of reasoning that do not require doctoral level knowledge,” Guha said. “A broader -accessory landmark allows a wider group of scholars to understand and analyze the results, which in turn can lead to better solutions in the future. Moreover, as the best art models They are increasingly placed in environments that all touch, we believe that everyone should be able to intuition what these models are not capable of. “

Editor Team
  • Website

Keep Reading

The notion undertakes the notes as granolas with its own transcription feature

Realta Fusion Taps $ 36 million in fresh funds for his melting reactor in a bottle

Microsoft Build 2025: To what should we wait, from Azure to Copilot Updates

Alltrails debuts $ 80/year membership that includes smart ways with him

Anthropic Co -founder Jared Kaplan is coming to Techcrunch sessions: he

Improvements in ‘Reasoning’ He can slowly slow down, find the analysis

Add A Comment

Comments are closed.

Top Posts

Electra found a cheap and clean way to clean iron and is raising $257 million to make it happen

January 3, 20251 Views

Samantha Harris calls herself ‘without cancer’ after breast cancer returns

April 2, 20251 Views

13 Spring button shirts enjoying all types of body

April 5, 20250 Views

Today at Sky Sports Racing: Willie Mullis Fields Four at Sussex Champion Hurdle Raid in Plumpton | Racing news

April 20, 20251 Views

Feder 1.3 million Ford F-150 trucks are examined in the United States for unexpected gang shift

March 25, 20251 Views

The IMF warns of increased risk of American recession and protects the policy of the nourishment

April 22, 20251 Views

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

Don't Miss

The notion undertakes the notes as granolas with its own transcription feature

Tech May 13, 2025

Meeting the transcription space is a hot commodity for all productivity suite. Companies like Clickup…

“We deserve it much better”

May 13, 2025

Danny Care: Former Scrum Half in England to retire from the Rugby Union at the end of the season | Rugby Union News

May 13, 2025

American inflation falls to 2.3% in April as the tariff effect approaches

May 13, 2025
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
About Trends Today
About Trends Today

Stay informed with the latest news, trending stories, and in-depth analysis, brought to you with accuracy, integrity, and a focus on what matters most.

Facebook X (Twitter) Pinterest
Our Picks

The notion undertakes the notes as granolas with its own transcription feature

May 13, 2025

“We deserve it much better”

May 13, 2025

Danny Care: Former Scrum Half in England to retire from the Rugby Union at the end of the season | Rugby Union News

May 13, 2025
Most Popular

Morgan Stanley Cedes Chief Goldman Sachs Rival

February 9, 2025447 Views

Steven Crueger of Yellowjackets excites the big responses that fans won’t see to come

February 14, 2025166 Views

VP JD Vance and his new family begin their life in the official residence

January 25, 202585 Views
Facebook X (Twitter) Instagram Pinterest
  • Home
  • Privacy Policy
  • Contact Us

© 2025 Trends Today. All Rights Reserved.
Developed By RELANCER LTD

Type above and press Enter to search. Press Esc to cancel.

Ad Blocker Enabled!
Ad Blocker Enabled!
Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.