Table of Contents
Fetching ...

Learning Formal Mathematics From Intrinsic Motivation

Gabriel Poesia, David Broman, Nick Haber, Noah D. Goodman

TL;DR

Minimo investigates autonomously building mathematical reasoning by starting from axioms in dependent type theory and jointly learning conjecturing and proving. It combines constrained decoding for valid conjecture generation, a shared Transformer-based policy/value LM for proof search in a finite environment, and hindsight relabeling to dramatically improve sample efficiency. Across propositional logic, arithmetic, and group theory, Minimo demonstrates self-improvement in both conjecturing harder provable problems and proving them, while also improving on a set of human-written theorems. The work advances toward compute-bound, self-improving agents capable of discovering new mathematics from first principles, albeit currently limited by library growth and scalability challenges.

Abstract

How did humanity coax mathematics from the aether? We explore the Platonic view that mathematics can be discovered from its axioms - a game of conjecture and proof. We describe Minimo (Mathematics from Intrinsic Motivation): an agent that jointly learns to pose challenging problems for itself (conjecturing) and solve them (theorem proving). Given a mathematical domain axiomatized in dependent type theory, we first combine methods for constrained decoding and type-directed synthesis to sample valid conjectures from a language model. Our method guarantees well-formed conjectures by construction, even as we start with a randomly initialized model. We use the same model to represent a policy and value function for guiding proof search. Our agent targets generating hard but provable conjectures - a moving target, since its own theorem proving ability also improves as it trains. We propose novel methods for hindsight relabeling on proof search trees to significantly improve the agent's sample efficiency in both tasks. Experiments on 3 axiomatic domains (propositional logic, arithmetic and group theory) demonstrate that our agent can bootstrap from only the axioms, self-improving in generating true and challenging conjectures and in finding proofs.

Learning Formal Mathematics From Intrinsic Motivation

TL;DR

Minimo investigates autonomously building mathematical reasoning by starting from axioms in dependent type theory and jointly learning conjecturing and proving. It combines constrained decoding for valid conjecture generation, a shared Transformer-based policy/value LM for proof search in a finite environment, and hindsight relabeling to dramatically improve sample efficiency. Across propositional logic, arithmetic, and group theory, Minimo demonstrates self-improvement in both conjecturing harder provable problems and proving them, while also improving on a set of human-written theorems. The work advances toward compute-bound, self-improving agents capable of discovering new mathematics from first principles, albeit currently limited by library growth and scalability challenges.

Abstract

How did humanity coax mathematics from the aether? We explore the Platonic view that mathematics can be discovered from its axioms - a game of conjecture and proof. We describe Minimo (Mathematics from Intrinsic Motivation): an agent that jointly learns to pose challenging problems for itself (conjecturing) and solve them (theorem proving). Given a mathematical domain axiomatized in dependent type theory, we first combine methods for constrained decoding and type-directed synthesis to sample valid conjectures from a language model. Our method guarantees well-formed conjectures by construction, even as we start with a randomly initialized model. We use the same model to represent a policy and value function for guiding proof search. Our agent targets generating hard but provable conjectures - a moving target, since its own theorem proving ability also improves as it trains. We propose novel methods for hindsight relabeling on proof search trees to significantly improve the agent's sample efficiency in both tasks. Experiments on 3 axiomatic domains (propositional logic, arithmetic and group theory) demonstrate that our agent can bootstrap from only the axioms, self-improving in generating true and challenging conjectures and in finding proofs.
Paper Structure (30 sections, 6 figures)

This paper contains 30 sections, 6 figures.

Figures (6)

  • Figure 1: We train mathematical reasoning agents starting only from the axioms of a given formal domain, where they jointly learn to pose challenging but provable conjectures and to find their proofs.
  • Figure 2: Difficulty of proved conjectures found in each iteration of Minimo, evaluate as the log-probability of the proof under the policy checkpoints after each iteration of the training loop (with standard error bands for runs with 3 random seeds).
  • Figure 3: Difficulty of proved conjectures proposed in each iteration under the current policy at that same iteration, comparing when using and not using hindsight relabeling to generate new proofs and conjectures, with standard error bands for runs with 3 random seeds. Ideal behavior would be a flat line, representing constant relative difficulty. Hindsight significantly helps the agent conjecture propose harder problems.
  • Figure 4: Success rate of our agents at proving theorems from the "Introduction to Metamathematics" textbook and the Natural Number Game. As agents train on their own conjectures, they also improve at solving problems from these two human-written sources.
  • Figure 5: Relation between MCTS iterations until the proof is found, vs log-likelihood of the proof that is found. The higher the likelihood, the faster MCTS is in finding the proof.
  • ...and 1 more figures