Table of Contents
Fetching ...

Mathematical exploration and discovery at scale

Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, Adam Zsolt Wagner

TL;DR

The paper addresses automating mathematical discovery at scale by coupling large language models with evolutionary search to propose, test, and refine algorithmic constructions across diverse mathematical problems. It introduces AlphaEvolve, a Generator–Evaluator framework with two modes (search and generalizer), and demonstrates autonomous discovery and improvement on 67 problems spanning analysis, combinatorics, geometry, and number theory, including generalizing patterns to universal formulas in some cases. It also shows integration with proof tools such as Deep Think and AlphaProof to move from empirical patterns to automated proofs and verification, while acknowledging limitations as a non-universal solver and noting issues like overfitting and prompt sensitivity. The results highlight the potential of AI-assisted mathematical discovery at scale to complement human intuition, accelerate exploration of large search spaces, and enable new interactions between mathematicians and AI systems, with practical implications for discovery and verification workflows.

Abstract

AlphaEvolve is a generic evolutionary coding agent that combines the generative capabilities of LLMs with automated evaluation in an iterative evolutionary framework that proposes, tests, and refines algorithmic solutions to challenging scientific and practical problems. In this paper we showcase AlphaEvolve as a tool for autonomously discovering novel mathematical constructions and advancing our understanding of long-standing open problems. To demonstrate its breadth, we considered a list of 67 problems spanning mathematical analysis, combinatorics, geometry, and number theory. The system rediscovered the best known solutions in most of the cases and discovered improved solutions in several. In some instances, AlphaEvolve is also able to generalize results for a finite number of input values into a formula valid for all input values. Furthermore, we are able to combine this methodology with Deep Think and AlphaProof in a broader framework where the additional proof-assistants and reasoning systems provide automated proof generation and further mathematical insights. These results demonstrate that large language model-guided evolutionary search can autonomously discover mathematical constructions that complement human intuition, at times matching or even improving the best known results, highlighting the potential for significant new ways of interaction between mathematicians and AI systems. We present AlphaEvolve as a powerful new tool for mathematical discovery, capable of exploring vast search spaces to solve complex optimization problems at scale, often with significantly reduced requirements on preparation and computation time.

Mathematical exploration and discovery at scale

TL;DR

The paper addresses automating mathematical discovery at scale by coupling large language models with evolutionary search to propose, test, and refine algorithmic constructions across diverse mathematical problems. It introduces AlphaEvolve, a Generator–Evaluator framework with two modes (search and generalizer), and demonstrates autonomous discovery and improvement on 67 problems spanning analysis, combinatorics, geometry, and number theory, including generalizing patterns to universal formulas in some cases. It also shows integration with proof tools such as Deep Think and AlphaProof to move from empirical patterns to automated proofs and verification, while acknowledging limitations as a non-universal solver and noting issues like overfitting and prompt sensitivity. The results highlight the potential of AI-assisted mathematical discovery at scale to complement human intuition, accelerate exploration of large search spaces, and enable new interactions between mathematicians and AI systems, with practical implications for discovery and verification workflows.

Abstract

AlphaEvolve is a generic evolutionary coding agent that combines the generative capabilities of LLMs with automated evaluation in an iterative evolutionary framework that proposes, tests, and refines algorithmic solutions to challenging scientific and practical problems. In this paper we showcase AlphaEvolve as a tool for autonomously discovering novel mathematical constructions and advancing our understanding of long-standing open problems. To demonstrate its breadth, we considered a list of 67 problems spanning mathematical analysis, combinatorics, geometry, and number theory. The system rediscovered the best known solutions in most of the cases and discovered improved solutions in several. In some instances, AlphaEvolve is also able to generalize results for a finite number of input values into a formula valid for all input values. Furthermore, we are able to combine this methodology with Deep Think and AlphaProof in a broader framework where the additional proof-assistants and reasoning systems provide automated proof generation and further mathematical insights. These results demonstrate that large language model-guided evolutionary search can autonomously discover mathematical constructions that complement human intuition, at times matching or even improving the best known results, highlighting the potential for significant new ways of interaction between mathematicians and AI systems. We present AlphaEvolve as a powerful new tool for mathematical discovery, capable of exploring vast search spaces to solve complex optimization problems at scale, often with significantly reduced requirements on preparation and computation time.

Paper Structure

This paper contains 60 sections, 137 equations, 35 figures, 6 tables.

Figures (35)

  • Figure 1: Performance on Problem \ref{['first-auto']}: running AlphaEvolve with more parallel threads leads to the discovery of good constructions faster, but at a greater total compute cost. The results displayed are the averages of 100 experiments with 2 CPU threads, 40 experiments with 5 CPU threads, 20 experiments with 10 CPU threads, and 10 experiments with 20 CPU threads.
  • Figure 2: Comparison of 50 experiments on Problem \ref{['first-auto']} using a cheap LLM and 20 experiments using a more expensive LLM. The experiments using a cheaper LLM required about twice as many calls as the ones using expensive ones, and this ratio tends to be even larger for more difficult problems.
  • Figure 3: Left: the constructions produced by AlphaEvolve for Problem \ref{['first-auto']}, Right: their autoconvolutions. From top to bottom, their scores are 1.5053, 1.5040, and 1.5032 (smaller is better).
  • Figure 4: Left: the best construction for Problem \ref{['second-auto']} discovered by AlphaEvolve. Right: its autoconvolution. Both functions are highly irregular and difficult to plot.
  • Figure 5: AlphaEvolve applied for optimization of total union area of (top) triangles and (bottom) parallelograms using our search method: (left) Total area of AlphaEvolve's constructions compared with Keich's construction and (right) monitoring the corresponding $S^T, S^P$ scores for both.
  • ...and 30 more figures