Fortune

Google researchers claim new breakthrough in getting AI to solve tough high school math problems

By Jeremy Kahn,

23 hours ago

Google DeepMind says it has achieved a breakthrough in building an AI system that can handle complex mathematical problems.

The research division, which is part of Alphabet-owned Google, announced Thursday that it has created a software system that combines multiple AI models and can score well-enough on the International Mathematical Olympiad (IMO), a global test of high school students’ mathematical talents, to be in the top quartile of contestants taking the test. This would be good enough to obtain a silver medal in the competition.

A boastworthy milestone in the machine-versus-mathlete contest, the feat also opens new possibilities for combining different approaches to AI in order to create more capable hybrid AI systems—something that Google said could eventually make its way into commercial products like its line-up of Gemini AI tools.

The news was an advancement on a system that the AI research lab had unveiled in January , called AlphaGeometry, that could solve geometry problems from the IMO about as well as top high school students. The new system—which combines a new model called AlphaSolver and an updated and improved AlphaGeometry 2—can tackle all kinds of different mathematical problems and develop sophisticated answers.

The new system managed to answer the most difficult IMO question, which only five of the 609 human contestants in last week’s competition managed to solve. That said, the new system is not perfect—on two of the six problems in the IMO, the new system did not manage to find a solution, and on one problem it took the system three days to reach the correct answer. Human competitors have to solve three questions in four-and-a-half hours, so can average no more than 90 minutes per question.

Google DeepMind researchers said the new system is a step towards more powerful AI models that will be able to plan and reason about complex tasks, although they cautioned that the method would work best in situations where there was a clear way to determine if an output was a valid. This is the case, for example, in software coding where the code will only compile and run if it is valid. David Silver, one of the Google DeepMind researchers who worked on the new system, said it might also work in areas where humans could provide unambiguous feedback about whether the solution the AI produced was a good one.

Google DeepMind said it would incorporate insights from the new system into future versions of its Gemini AI models, although they did not say exactly how this would be done or how soon Gemini might see these upgraded mathematical abilities.

Silver acknowledged that in many real world situations the validity of an answer is highly subjective, or the soundness of a solution can only be determined after a long time period. He said this would make it harder to apply the methods Google DeepMind used for the IMO problems to successfully take on these kind of real world problems.

AlphaZero to the rescue

Unlike other well-known AI models that consist of a single large neural network—a kind of AI software loosely based on the human brain—the AlphaSolver system involves multiple neural networks, each performing different functions.

A large language model (LLM)—in this case Google’s Gemini model —is used as one part of the process. But the LLM does not itself do the mathematical reasoning. LLMs which underpin popular AI chatbots—such as Gemini, OpenAI’s ChatGPT, Anthropic’s Claude, and Meta’s AI chatbot—have struggled with solving math problems unless given access to outside tools, such as calculators or specialized math software.

Instead, the LLM is fine-tuned to translate text-based mathematical problems into a formal mathematical language. It then passes the problem to a different AI model, Google DeepMind’s AlphaZero , which was developed in 2017 and originally used to learn to play the strategy board games chess, go, and shogi at superhuman levels. But it turns out that AlphaZero can be used to puzzle out all kinds of problems in systems with clear rules and an easy way of keeping score.

In this case, the AlphaZero component is trained to suggest proof steps to the problem in Lean, a mathematical programming language. If the proof step is valid, it will compile correctly in Lean. If it isn’t, it won’t. This provides a reward signal—much like points in a video game—to AlphaZero. In this way, the AlphaZero component of AlphaSolver, learns by trial and error to take steps that are more likely to result in valid solutions. AlphaSolver was trained on about one million of examples of IMO problems in the weeks leading up to the competition, Google DeepMind said, and it continued to improve while working on the IMO contest problems.

In cases where the problem involved geometry, the problem was given instead to AlphaGeometry 2. AlphaGeometry 2 is also a hybrid system, combing an LLM component with a component that uses symbolic reasoning. The new AlphaGeometry could solve 83% of IMO geometry problems compared to just 53% for its predecessor. In one case, AlphaGeomerty was able to solve a highly complex geometry problem in just 19 seconds, a feat more akin to a flash of inspiration than a brute force approach based on endless trial and error. In another case, the proof AlphaGeometry offered initially confused some mathematicians who examined it, but they determined it was actually an elegant and highly-unusual way of solving the problem.

Impact on human mathematicians

Pushmeet Kohli, who heads Google DeepMind’s AI for science division, said he saw AlphaSolver and AlphaGeometry 2 primarily as tools for helping mathematicians in their work. Silver said he did not see these new mathematical AIs challenging the relevancy of academic mathematicians.

But Timothy Gowers, who is a director of research in mathematics at the University of Cambridge and a past winner of the Fields Medal—a prize that is awarded only once every four years to two to four mathematicians under the age of 40 who have contributed the most to the field—reviewed the proofs AlphaSolver and AlphaGeometry 2 produced and said he came away impressed. “I could recognize familiar-looking arguments that had come out of the system,” he said.

He also said that some of the problems required him, as a human mathematician, “to dig quite deep” and come up with what he called “a sort of magic key” that suddenly turns a problem that looks unsolvable into one that is imminently solvable. Gowers said he was surprised that the system had discovered a few of these magic keys because his intuition is that they would be difficult to stumble upon by naïve trial-and-error without any underlying understanding of the mathematical principles involved. But he said he reserved judgment as to whether this meant AlphaSolver had actually developed something akin to mathematical intuition. He said more research would be needed to understand more about exactly how the system managed to puzzle out answers to the IMO problems.

Gowers noted that IMO problems were much simpler than what research mathematicians work on. But, compared to Kohli and Silver, Gowers was far less sanguine about what future would hold if AI models kept improving at the current clip. “I actually think that when computers become really good at finding extremely hard proofs, that's more or less game over for mathematical research,” he said. “I'm not trying to suggest that we're all that close to that at the moment, but I just, I'm thinking a long way ahead, but how long ahead that really is, is very hard to say.”

This story was originally featured on Fortune.com

Expand All

Read in NewsBreak

Comments / 0

Add a Comment

Fortune2 days ago

One year after it disappeared, Overstock.com is coming back

Fortune1 day ago

OpenAI’s new SearchGPT takes aim at Google in the battle for AI search dominance. But will it win the war?

Fortune16 hours ago

Sundar Pichai wants real world results for his AI bots—so Google gave bonuses and golden bomber jackets to staff who came up with winning prompts

Fortune1 day ago

In the last 25 years, black hole physicists have uncovered the unimaginable

Space.com9 days ago

Physicists Just Made a Breakthrough That Could Explain Why We Exist

scitechdaily.com5 days ago

Favorite Chicken Chain Suddenly Closes All Stores, Heartfelt Message Found on Doors

Lancaster County, PA9 days ago

Time might be a mirage created by quantum physics, study suggests

Space.com9 days ago

Antimatter detected on International Space Station could reveal new physics

LiveScience23 hours ago

Chelsea Clinton is rumored to be living in Virginia but probably still in NYC

New York City, NY11 days ago

US scientists cool nuclear fusion reactor with liquid lithium breakthrough

Interesting Engineering3 days ago

New Quantum “Tornado” Experiments Challenge Our Understanding of Black Holes

scitechdaily.com2 days ago

Redefining Time and Space: Scientists Have Developed the World’s Most Accurate Atomic Clock

scitechdaily.com3 days ago

Strange Motion of Neutrons Proves Nature Is Fundamentally Bizarre

ScienceAlert16 days ago

Scientists invented 'no melt' ice cream that holds its shape for 4 hours, but you can't eat it yet

Business Insider21 days ago

Scientists unveil ‘time-traveling’ quantum sensor breakthrough

Saint Louis, MO9 days ago

Jaguar Land Rover built the world’s best electric race car. Here’s how they plan to put that tech into tomorrow’s EVs

Fortune2 days ago

South Korean scientists develop the first-ever remote device that can control human mind

breezyscroll.com3 days ago

Scientists Say Sha’Carri Richardson Could (Theoretically) Walk On Water

Vogue Magazine22 hours ago

5 Zodiac Signs Who Are Most Likely to Win Arguments

Total Apex Sports & Entertainment27 days ago

Welcome to NewsBreak, an open platform where diverse perspectives converge. Most of our content comes from established publications and journalists, as well as from our extensive network of tens of thousands of creators who contribute to our platform. We empower individuals to share insightful viewpoints through short posts and comments. It’s essential to note our commitment to transparency: our Terms of Use acknowledge that our services may not always be error-free, and our Community Standards emphasize our discretion in enforcing policies. We strive to foster a dynamic environment for free expression and robust discourse through safety guardrails of human and AI moderation. Join us in shaping the news narrative together.

Comments / 0

Community Policy