Table of Contents
Fetching ...

3D-Prover: Diversity Driven Theorem Proving With Determinantal Point Processes

Sean Lamont, Christian Walder, Amir Dezfouli, Paul Montague, Michael Norrish

TL;DR

Automated theorem proving suffers from an explosion in the search space as tactic choices proliferate and execution errors waste resources. The authors propose 3D-Prover, a modular, diversity-driven filtering layer that learns transition-aware tactic embeddings from synthetic proof transitions and uses Determinantal Point Processes to select a diverse, high-quality subset of tactics to execute. They demonstrate improvements on miniF2F and LeanDojo benchmarks by augmenting ReProver and InternLM2.5-Step-Prover, achieving higher proof success, faster execution, and greater tactic diversity. By learning environment dynamics and enforcing diversity during search, 3D-Prover offers a practical approach to prune search and enable deeper automated proofs with modest overhead.

Abstract

A key challenge in automated formal reasoning is the intractable search space, which grows exponentially with the depth of the proof. This branching is caused by the large number of candidate proof tactics which can be applied to a given goal. Nonetheless, many of these tactics are semantically similar or lead to an execution error, wasting valuable resources in both cases. We address the problem of effectively pruning this search, using only synthetic data generated from previous proof attempts. We first demonstrate that it is possible to generate semantically aware tactic representations which capture the effect on the proving environment, likelihood of success, and execution time. We then propose a novel filtering mechanism which leverages these representations to select semantically diverse and high quality tactics, using Determinantal Point Processes. Our approach, 3D- Prover, is designed to be general, and to augment any underlying tactic generator. We demonstrate the effectiveness of 3D-Prover on the miniF2F and LeanDojo benchmarks by augmenting popular open source proving LLMs. We show that our approach leads to an increase in the overall proof rate, as well as a significant improvement in the tactic success rate, execution time and diversity. We make our code available at https://github.com/sean-lamont/3D-Prover.

3D-Prover: Diversity Driven Theorem Proving With Determinantal Point Processes

TL;DR

Automated theorem proving suffers from an explosion in the search space as tactic choices proliferate and execution errors waste resources. The authors propose 3D-Prover, a modular, diversity-driven filtering layer that learns transition-aware tactic embeddings from synthetic proof transitions and uses Determinantal Point Processes to select a diverse, high-quality subset of tactics to execute. They demonstrate improvements on miniF2F and LeanDojo benchmarks by augmenting ReProver and InternLM2.5-Step-Prover, achieving higher proof success, faster execution, and greater tactic diversity. By learning environment dynamics and enforcing diversity during search, 3D-Prover offers a practical approach to prune search and enable deeper automated proofs with modest overhead.

Abstract

A key challenge in automated formal reasoning is the intractable search space, which grows exponentially with the depth of the proof. This branching is caused by the large number of candidate proof tactics which can be applied to a given goal. Nonetheless, many of these tactics are semantically similar or lead to an execution error, wasting valuable resources in both cases. We address the problem of effectively pruning this search, using only synthetic data generated from previous proof attempts. We first demonstrate that it is possible to generate semantically aware tactic representations which capture the effect on the proving environment, likelihood of success, and execution time. We then propose a novel filtering mechanism which leverages these representations to select semantically diverse and high quality tactics, using Determinantal Point Processes. Our approach, 3D- Prover, is designed to be general, and to augment any underlying tactic generator. We demonstrate the effectiveness of 3D-Prover on the miniF2F and LeanDojo benchmarks by augmenting popular open source proving LLMs. We show that our approach leads to an increase in the overall proof rate, as well as a significant improvement in the tactic success rate, execution time and diversity. We make our code available at https://github.com/sean-lamont/3D-Prover.

Paper Structure

This paper contains 29 sections, 2 equations, 6 figures, 13 tables, 1 algorithm.

Figures (6)

  • Figure 1: An example node expansion for a failed ReProver attempt, which 3D-Prover was able to prove. Tactics on the left result in the same proof state, tactics on the right result in an error, and tactics in the centre result in a unique proof state. The high error rate and tactic similarity motivates our filtering approach, which prunes the search space to give a diverse set of subgoals.
  • Figure 2: Our Combined architecture for learning transition aware tactic embeddings. The tactic $\bf{t}$ and goal $\bf{g}$ are concatenated and passed through the Encoder $E$. A representation vector $\bf{e}$ is generated by mean-pooling over the tactic token embeddings $\bf{t'}$. The Predictor $P$ takes this embedding and predicts whether the tactic results in an error (Status), and the execution time (Time). The Decoder $D$ takes the embedding and goal to predict the environment response (Output), which is either an error message or new goals to prove. This setup yields a compact representation of the tactic which captures its effect on the proving environment, enabling our proposed filtering model.
  • Figure 3: Visualisation of DPP for tactic filtering. The tactic embeddings from the transition model are scaled by quality scores, before a subset of tactics are selected using $k$-DPP. Subsets are chosen proportionally to the area spanned by their elements, giving a combination of quality and diversity. For this simplified example, we take the 2D PCA projection of embeddings for tactics in Figure \ref{['fig:tree']}, setting the quality to the scaled generator logits. Comparing the shaded areas in (b) and assuming subst c and rw h$_1$ have been selected, we see that symmetry is favoured over simp [h$_1$]. Although simp [h$_1$] is scored higher by the generator, it is less diverse with respect to $\texttt{subst\;c}$ and $\texttt{rw h$_1$}$.
  • Figure 4: Cosine similarity between tactic embeddings resulting in unique subgoals, for a sample root node in miniF2F-valid. The top value gives the similarity for embeddings from 3D-Prover, while the bottom gives the similarity for embeddings from an Autoencoder. We see that 3D-Prover better separates these semantically distinct tactics, in comparison to the Autoencoder, which only separates based on their syntax.
  • Figure 5: Distribution of cosine similarity for tactic embeddings resulting in unique subgoals, averaged over root nodes in miniF2F-valid. We see that 3D-Prover gives embeddings which better separate these semantically distinct tactics, in comparison the the syntax focused embeddings of the Autoencoder.
  • ...and 1 more figures