Table of Contents
Fetching ...

Generative Modeling for Mathematical Discovery

Jordan S. Ellenberg, Cristofero S. Fraser-Taliente, Thomas R. Harvey, Karan Srivastava, Andrew V. Sutherland

TL;DR

This work presents a practical implementation of funsearch, an LLM-guided genetic algorithm that evolves priority functions to guide construction of mathematical objects, using an island-model to maintain diversity. It benchmarks three problems—cap sets in $\mathbb{Z}_3^n$, narrow-admissible tuples under the DHL$[k,j]$ framework, and no-isosceles subsets of $n\times n$ grids—showing that learned priorities can yield nontrivial solutions and sometimes generalize beyond the trained instance. A key contribution is the OpenRouter-enabled, ML-light pipeline that preserves the solve-priority separation and supports multiple models and monitoring tools, enabling accessible experimentation for working mathematicians. The results indicate that model cost does not reliably predict performance, extended long runs often do not improve outcomes, and problem presentations and symmetries can significantly influence generalization, suggesting practical strategies for deploying LLM-guided search in combinatorial discovery. Overall, the paper provides a flexible, practical framework and repository for applying LLM-driven genetic search to diverse mathematical problems at modest computational cost.

Abstract

We present a new implementation of the LLM-driven genetic algorithm {\it funsearch}, whose aim is to generate examples of interest to mathematicians and which has already had some success in problems in extremal combinatorics. Our implementation is designed to be useful in practice for working mathematicians; it does not require expertise in machine learning or access to high-performance computing resources. Applying {\it funsearch} to a new problem involves modifying a small segment of Python code and selecting a large language model (LLM) from one of many third-party providers. We benchmarked our implementation on three different problems, obtaining metrics that may inform applications of {\it funsearch} to new problems. Our results demonstrate that {\it funsearch} successfully learns in a variety of combinatorial and number-theoretic settings, and in some contexts learns principles that generalize beyond the problem originally trained on.

Generative Modeling for Mathematical Discovery

TL;DR

This work presents a practical implementation of funsearch, an LLM-guided genetic algorithm that evolves priority functions to guide construction of mathematical objects, using an island-model to maintain diversity. It benchmarks three problems—cap sets in , narrow-admissible tuples under the DHL framework, and no-isosceles subsets of grids—showing that learned priorities can yield nontrivial solutions and sometimes generalize beyond the trained instance. A key contribution is the OpenRouter-enabled, ML-light pipeline that preserves the solve-priority separation and supports multiple models and monitoring tools, enabling accessible experimentation for working mathematicians. The results indicate that model cost does not reliably predict performance, extended long runs often do not improve outcomes, and problem presentations and symmetries can significantly influence generalization, suggesting practical strategies for deploying LLM-guided search in combinatorial discovery. Overall, the paper provides a flexible, practical framework and repository for applying LLM-driven genetic search to diverse mathematical problems at modest computational cost.

Abstract

We present a new implementation of the LLM-driven genetic algorithm {\it funsearch}, whose aim is to generate examples of interest to mathematicians and which has already had some success in problems in extremal combinatorics. Our implementation is designed to be useful in practice for working mathematicians; it does not require expertise in machine learning or access to high-performance computing resources. Applying {\it funsearch} to a new problem involves modifying a small segment of Python code and selecting a large language model (LLM) from one of many third-party providers. We benchmarked our implementation on three different problems, obtaining metrics that may inform applications of {\it funsearch} to new problems. Our results demonstrate that {\it funsearch} successfully learns in a variety of combinatorial and number-theoretic settings, and in some contexts learns principles that generalize beyond the problem originally trained on.

Paper Structure

This paper contains 10 sections, 1 equation, 14 figures, 5 tables.

Figures (14)

  • Figure 1: The basic structure of funsearch.
  • Figure 2: The best score as a logarithmic function of relative token uses for the longer runs from Table \ref{['tab:long']}.
  • Figure 3: Graph of length of isosceles-free sets generated by models trained on different values of n.
  • Figure 4: Isosceles-free subset of size $46$ for a $32 \times 32$ grid and heatmap representing the priorities assigned to each point by a model trained on $n=9$. The true largest subset has size $56$.
  • Figure 5: Graph of length of isosceles-free sets on a lattice embedded on a torus generated by models trained on different values of n.
  • ...and 9 more figures