Table of Contents
Fetching ...

BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving

Sean Lamont, Michael Norrish, Amir Dezfouli, Christian Walder, Paul Montague

TL;DR

BAIT is presented, a framework for the fair and streamlined comparison of learning approaches in ITP, and it is found that Structure Aware Transformers perform particularly well, improving on techniques associated with the original problem sets.

Abstract

Artificial Intelligence for Theorem Proving has given rise to a plethora of benchmarks and methodologies, particularly in Interactive Theorem Proving (ITP). Research in the area is fragmented, with a diverse set of approaches being spread across several ITP systems. This presents a significant challenge to the comparison of methods, which are often complex and difficult to replicate. Addressing this, we present BAIT, a framework for fair and streamlined comparison of learning approaches in ITP. We demonstrate BAIT's capabilities with an in-depth comparison, across several ITP benchmarks, of state-of-the-art architectures applied to the problem of formula embedding. We find that Structure Aware Transformers perform particularly well, improving on techniques associated with the original problem sets. BAIT also allows us to assess the end-to-end proving performance of systems built on interactive environments. This unified perspective reveals a novel end-to-end system that improves on prior work. We also provide a qualitative analysis, illustrating that improved performance is associated with more semantically-aware embeddings. By streamlining the implementation and comparison of Machine Learning algorithms in the ITP context, we anticipate BAIT will be a springboard for future research.

BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving

TL;DR

BAIT is presented, a framework for the fair and streamlined comparison of learning approaches in ITP, and it is found that Structure Aware Transformers perform particularly well, improving on techniques associated with the original problem sets.

Abstract

Artificial Intelligence for Theorem Proving has given rise to a plethora of benchmarks and methodologies, particularly in Interactive Theorem Proving (ITP). Research in the area is fragmented, with a diverse set of approaches being spread across several ITP systems. This presents a significant challenge to the comparison of methods, which are often complex and difficult to replicate. Addressing this, we present BAIT, a framework for fair and streamlined comparison of learning approaches in ITP. We demonstrate BAIT's capabilities with an in-depth comparison, across several ITP benchmarks, of state-of-the-art architectures applied to the problem of formula embedding. We find that Structure Aware Transformers perform particularly well, improving on techniques associated with the original problem sets. BAIT also allows us to assess the end-to-end proving performance of systems built on interactive environments. This unified perspective reveals a novel end-to-end system that improves on prior work. We also provide a qualitative analysis, illustrating that improved performance is associated with more semantically-aware embeddings. By streamlining the implementation and comparison of Machine Learning algorithms in the ITP context, we anticipate BAIT will be a springboard for future research.
Paper Structure (30 sections, 2 equations, 3 figures, 4 tables)

This paper contains 30 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: AI-ITP setup. A model $\pi$ interacts with a proving environment $\mathcal{E}$, mapping a state $s$ to an action $(t,g)$, which defines tactic(s) $t$ to apply to goal(s) $g \subseteq s$. $\pi$ is trained with rewards or Data from processed Proof Logs, sourced from human proofs or agent-environment interactions.
  • Figure 2: Overview of Bait. The Experiment module abstractly defines tasks such as premise selection or RL. Experiment instances take a configuration specifying the Data, Model and Environment instances to use, with Checkpoints and Logs as output. Proof Data, either from existing sources or generated logs, is processed, stored and transformed for input to the Model.
  • Figure 3: Goals proven during training by TacticZero, with different underlying embedding architectures.