Table of Contents
Fetching ...

Advancing mathematics research with generative AI

Lisa Carbone

TL;DR

The work surveys how generative AI—particularly LLMs, LRMs, and LCMs—can assist mathematical research by identifying patterns, guiding computational work, and enabling interactive collaboration, while acknowledging fundamental limits in formal deduction and verification. It details the mechanics of tokens, embeddings, and context windows, and discusses how prompt design, chain-of-thought reasoning, and neuro-symbolic verification enhance reliability. The paper also covers practical integrations with Computer Algebra Systems and formal proof assistants (e.g., Lean), case studies in Combinatorial Group Theory, and a future vision of iterative human–AI collaboration that accelerates discovery. Overall, it argues for a hybrid, carefully verified workflow where AI acts as a creative partner, not a sole oracle, in mathematical practice.

Abstract

The main drawback of using generative AI models for advanced mathematics is that these models are not primarily logical reasoning engines. However, Large Language Models, and their refinements, can pick up on patterns in higher mathematics that are difficult for humans to see. By putting the design of generative AI models to their advantage, mathematicians may use them as powerful interactive assistants that can carry out laborious tasks, generate and debug code, check examples, formulate conjectures and more. We discuss how generative AI models can be used to advance mathematics research. We also discuss their integration with neuro-symbolic solvers, Computer Algebra Systems and formal proof assistants such as Lean.

Advancing mathematics research with generative AI

TL;DR

The work surveys how generative AI—particularly LLMs, LRMs, and LCMs—can assist mathematical research by identifying patterns, guiding computational work, and enabling interactive collaboration, while acknowledging fundamental limits in formal deduction and verification. It details the mechanics of tokens, embeddings, and context windows, and discusses how prompt design, chain-of-thought reasoning, and neuro-symbolic verification enhance reliability. The paper also covers practical integrations with Computer Algebra Systems and formal proof assistants (e.g., Lean), case studies in Combinatorial Group Theory, and a future vision of iterative human–AI collaboration that accelerates discovery. Overall, it argues for a hybrid, carefully verified workflow where AI acts as a creative partner, not a sole oracle, in mathematical practice.

Abstract

The main drawback of using generative AI models for advanced mathematics is that these models are not primarily logical reasoning engines. However, Large Language Models, and their refinements, can pick up on patterns in higher mathematics that are difficult for humans to see. By putting the design of generative AI models to their advantage, mathematicians may use them as powerful interactive assistants that can carry out laborious tasks, generate and debug code, check examples, formulate conjectures and more. We discuss how generative AI models can be used to advance mathematics research. We also discuss their integration with neuro-symbolic solvers, Computer Algebra Systems and formal proof assistants such as Lean.

Paper Structure

This paper contains 36 sections, 9 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: A simplified view of an LLM as a 'knowledge graph'.
  • Figure 2: A hypothetical embedding for $d=2$.
  • Figure 3: Projection of a token embedding $\varphi(w)$ in the embedding space $\mathbb{R}^d$ into specialized vectors ($q,k,v$) via linear transformations $T_q, T_k, T_v$, generation of the layer-$\ell$ output $\zeta^{(\ell)}(w)$, and repeated processing through a stack of transformer (neural network) blocks, including the feed-forward piecewise-linear map, producing the final contextualized representation $\zeta^{(N)}(w)$.
  • Figure 4: Unlike standard LLMs which predict tokens directly from the transformer stack, an LRM generates a computational plan (Python script), executes it in an external deterministic sandbox, then integrates the result back into the neural stream for the final response. Verification is built-in. The model no longer predicts mathematical answers. It predicts the algorithms needed to find the answers and it executes the algorithms.
  • Figure 5: On the left, (in blue) a smooth 1-dimensional manifold embedded in $\mathbb{R}^2$, where every point has a neighborhood homeomorphic to an open interval. On the right, (in red) a stratified subset of $\mathbb{R}^2$, hypothetically resembling an LLM token embedding space. It is a union of 1-dimensional strata crossing at a singular point and an isolated 0-dimensional stratum.
  • ...and 4 more figures