Cursor AI Agents Solve a Research-Level Math Challenge After Running Autonomously for 4 Days

Article Complete

Cursor AI Agents Solve a Research-Level Math Challenge After Running Autonomously for 4 Days

Save Listen

Image 4: Cursor AI Agents Solve a Research-Level Math Challenge After Running Autonomously for 4 Days

AI News

Cursor AI Agents Solve a Research-Level Math Challenge After Running Autonomously for 4 Days

The company says the system tackled problem Six of the First Proof challenge, a research-level mathematics benchmark.

Supreeth Koundinya

MARCH 4, 2026, 8:14 AM

Listen 5 min

SHARE Save

Follow:Preferred Source Google News WhatsApp Telegram

Cursor claims its autonomous AI system has produced a novel solution to a research-level mathematics problem after running autonomously for four days.

The company said the system tackled problem Six of the First Proof challenge, a research-level mathematics benchmark comprising 10 previously unpublished problems contributed by mathematicians and designed to test whether AI can produce original proofs.

According to Cursor co-founder Michael Truell, the agents generated a proof that may outperform the official human-written solution. The system ran continuously for four days, exploring approaches autonomously without prompts or hints.

“We believe Cursor discovered a novel solution to Problem Six of the First Proof challenge, a set of math research problems that approximate the work of Stanford, MIT, Berkeley academics,” Truell wrote on X. “Cursor's solution yields stronger results than the official, human-written solution.”

The result has not yet been academically verified. However, Cursor says early feedback from experts suggests the proof may be valid.

“We're still waiting for final expert review from Nikhil Srivastava (Associate Professor of Mathematics, UC Berkeley) or Daniel Spielman (Sterling Professor of Computer Science, Yale University), but we have received feedback from spectral graph theory expert Yang Liu that our proof is likely correct,” Truell said, adding that Stanford mathematician Jan Vondrák also reviewed the solution and found it appeared correct to the best of his knowledge.

The proof reportedly uses the Marcus–Spielman–Srivastava interlacing polynomial method, a technique from spectral graph theory. Truell said the approach differs from that used in the existing solution and yields stronger guarantees. “It used the Marcus-Spielman-Srivastava interlacing polynomial method, a different approach from existing solutions,” he wrote.

“Two concrete improvements: the constant c goes from 0.03 to 0.13, and the solution partitions the entire vertex set into light components rather than just a subset.”

Cursor saidthe experiment hints that systems originally designed for large-scale software engineering could generalise to research problems beyond coding.

The company ran the experiment using the same multi-agent harness it recently described in a research post about autonomous coding. The system coordinates hundreds of AI agents through a structured workflow that separates planning and execution roles.

In that setup, planner agents generate and refine tasks while worker agents execute them independently. The architecture was previously used in large-scale experiments in which agents collaborated for weeks to write millions of lines of code, including building a web browser from scratch.

Join the Discussion Be the first to share your thoughts

Open→

Advertise with Us Our Events Calendar Our AI Trainings

Our Coverage of AI News

•OpenAI Plans GitHub Rival, Signalling Strategic Shift in Ties With Microsoft
ChatGPT creator is quietly building a code-hosting platform that could challenge its biggest backer and partner,...Read more →
•Anthropic CEO Says 2026 Will Have a Radical Acceleration That Will Surprise Everyone
“We do not see hitting the wall. I think this year is going to have a radical acceleration that surprises everyone,”...Read more →
•Claude Opus 4.6 Surprises Turing Award Winner, Solves Problem He’d Been Working on for Weeks
“It seems that I’ll have to revise my opinions about ‘generative AI’ one of these days,” said Donald Knuth, widely...Read more →
•Anthropic’s Annual Revenue Run Rate Climbs Towards $20 Billion, Bloomberg Reports
Anthropic’s revenue surge coincides with its growing dispute with the US Department of Defense.Read more →
•SK Telecom, Supermicro & Schneider Electric Partner to Build Solutions for AI Data Centres
SK Telecom will provide operational expertise, Supermicro will supply GPU servers, while Schneider Electric will handle...Read more →
•Qwen’s Core Team Shaken as Technical Lead, Researchers Exit
Developers and users on social media expressed surprise at the departures from the Qwen project.Read more →

About the Author

### Supreeth Koundinya
Contributor

Got a tip? Share confidential information with AIM.

Editorial Standards|Reprints & Permissions

Print Edition

February 2026

Subscribe All Issues

AIM hosts 30+ AI conferences worldwide.

View Events Calendar→

Will GitHub's DOMINANCE Finally Come to an END?

1/10

AIM Games

What to Read Next

### OpenAI Plans GitHub Rival, Signalling Strategic Shift in Ties With Microsoft ### Anthropic CEO Says 2026 Will Have a Radical Acceleration That Will Surprise Everyone ### Claude Opus 4.6 Surprises Turing Award Winner, Solves Problem He’d Been Working on for Weeks ### Anthropic’s Annual Revenue Run Rate Climbs Towards $20 Billion, Bloomberg Reports ### SK Telecom, Supermicro & Schneider Electric Partner to Build Solutions for AI Data Centres ### Qwen’s Core Team Shaken as Technical Lead, Researchers Exit

1 #### So, Sam Altman Was Right About Indian AI Startups 2 #### In Just 243 Lines of Python Code, Andrej Karpathy Recreates GPT From Scratch 3 #### Cognizant Risks Recouping AI Investments as Competitors, Clients Tap Automation 4 #### Claude Code vs Codex: Developers are Choosing Sides 5 #### The Real Story Behind GCC Layoffs 6 #### Figma Wants to Break the Fourth Wall in Design 7 #### Texas Instruments Opens New R&D Centre in Bengaluru 8 #### Google Unveils 10-Hour AI Certificate in Latest Learning Push 9 #### LTTS Sees Wave of Senior Leadership Reshuffle Over 8 Months 10 #### Wait, Where is Krutrim?

Explore our newsletters
Build your routine with some of our top newsletters or view them all here.

Wake up informed

Make sense of the day's AI news and breakthroughs with our morning briefing.

Weekly BelamySee the latest

Industry intelligence

Receive a roundup of AI adoption stories by industry vertical, curated for professionals.

3x Weekly Sector6See the latest

Subscribe
By signing up, you agree to our Privacy Policy.

Image 11: Claude Opus 4.6 Surprises Turing Award Winner, Solves Problem He’d Been Working on for Weeks

AI News

Claude Opus 4.6 Surprises Turing Award Winner, Solves Problem He’d Been Working on for Weeks

“It seems that I’ll have to revise my opinions about ‘generative AI’ one of these days,” said Donald Knuth, widely regarded as the father of algorithm analysis.

Supreeth Koundinya

MARCH 4, 2026, 8:06 AM

Listen 5 min

SHARE Save

Follow:Preferred Source Google News WhatsApp Telegram

Computer scientist Donald Knuth, often regarded as the father of algorithm analysis, said in a recent note that Anthropic’s Claude Opus 4.6 has solved a mathematical problem he had been working on for several weeks.

“Shock! Shock! I learned yesterday that an open problem I’d been working on for several weeks had just been solved by Claude Opus 4.6,” Knuth wrote.

He added that the result made him reconsider his scepticism toward generative AI. “It seems that I’ll have to revise my opinions about ‘generative AI’ one of these days,” he said.

“What a joy it is to learn not only that my conjecture has a nice solution but also to celebrate this dramatic advance in automatic deduction and creative problem solving.”

The problem emerged while Knuth was preparing material for a future volume of his landmark series, The Art of Computer Programming.

In simple terms, it asked whether a certain type of mathematical network could always be split into three loops, each of which passes through every point exactly once.

Knuth had solved the smallest case, while mathematician Filip Stappers had discovered working examples for several larger ones through experimentation.

However, no general rule explaining why the pattern worked had been found.

Stappers then submitted the problem to Claude Opus 4.6 and asked the model to document its progress as it explored different approaches.

According to Knuth’s note, the model carried out dozens of experiments—writing and running small programmes, testing ideas, and gradually refining its strategy.

Eventually, Claude discovered a rule-based method that generates the required cycles. Tests showed the approach works for all odd values of the parameter tested, including cases up to 101. Knuth later verified the construction and produced a mathematical proof confirming the result.

The problem remains open for even-numbered cases. While the model reportedly found solutions for a few specific instances, it did not identify a general pattern.

Knuth described the episode as an impressive demonstration of how modern AI systems may assist in exploratory mathematics by combining coding, experimentation, and pattern discovery.

Knuth, a professor emeritus at Stanford University, received the Turing Award in 1974—often described as the Nobel Prize of computing—for his foundational contributions to algorithm analysis and for shaping the discipline through his writing.

The Art of Computer Programming series, first published in 1968, systematically catalogues algorithms, data structures, and mathematical techniques underlying modern software, and is widely considered one of the defining texts of computer science.

About the Author

### Supreeth Koundinya
Contributor