DeepMath - Deep Sequence Models for Premise Selection
Alex A. Alemi, Francois Chollet, Niklas Een, Geoffrey Irving, Christian Szegedy, Josef Urban
TL;DR
The paper tackles premise selection for large-scale automated theorem proving by introducing a two-stage neural pipeline that eliminates hand-crafted features. It first learns character-level representations of formulas and then builds word-level embeddings that incorporate symbol definitions, enabling effective axiom conjecture pairing. Empirical results on the Mizar library show that neural approaches can rival and complement traditional hand-engineered methods, with ensembles pushing the prover to auto-prove a substantial fraction of theorems. The work demonstrates the viability of deep learning in formal reasoning tasks and highlights practical gains for theorem-proving pipelines, while outlining avenues for further enhancement.
Abstract
We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the hand-engineered features of existing state-of-the-art models. To our knowledge, this is the first time deep learning has been applied to theorem proving on a large scale.
