
February 9, 2026 by Paul Arnold, Phys.org
Collected at: https://phys.org/news/2026-02-ai-struggle-math-problems.html#goog_rewarded
Mathematics, like many other scientific endeavors, is increasingly using artificial intelligence. Of course, math is the backbone of AI, but mathematicians are also turning to these tools for tasks like literature searches and checking manuscripts for errors. But how well can AI perform when it comes to solving genuine, high-level research problems?
To date, there is still no widely accepted realistic methodology for assessing AI’s capabilities to solve math at this level. So a group of mathematicians decided to put the machines to the test as they detail in a study available on the arXiv preprint server.
Previous attempts at testing AI have used math contest problems and questions already found in textbooks. What makes this study different is that the questions the programs faced were drawn from mathematicians’ own research. They had never been posted or published online, which means AI couldn’t memorize answers from its training data.
Testing the machines
Each mathematician taking part in the study contributed a unique problem and solved it themselves first to prove that the questions weren’t impossible. They also encrypted the answers so they would not appear in public sources that models could access.
There were ten problems overall across diverse mathematical fields, including stochastic analysis, spectral graph theory, symplectic geometry, and algebraic topology. The researchers tested the questions on several leading systems, including GPT-5.1 Pro and Gemini 3 Pro, and the models were given only one attempt per question. There were no additional prompts or conversations, nor any hints that might help them reach a solution.
The experiment, called First Proof, was designed to test a specific part of the mathematical process. As the researchers commented in their paper, “Our ‘first proof’ experiment is focused on the final and most well-specified stage of math research, in which the question and frameworks are already understood.”
AI struggles
The results can allay the fears of anyone concerned that AI is poised to replace mathematicians. While AI programs are excellent at summarizing existing knowledge or finding patterns in data, the models struggled to solve the problems in a single attempt.
The researchers’ overall conclusion is that, at the moment, AI is good at contest-like tasks but lacks the creative depth and intuition needed to navigate and solve the unknown.
Next up for the team is releasing the encrypted solutions on February 13, then beginning work on a second set of problems. They want to turn First Proof into a permanent benchmark that will continue to challenge AI, noting, “We hope to use this understanding to design a more formal benchmark.”
Publication details
Mohammed Abouzaid et al, First Proof, arXiv (2026). DOI: 10.48550/arxiv.2602.05192
Journal information: arXiv

Leave a Reply