The Tactician's Web of Large-Scale Formal Knowledge
Lasse Blaauwbroek
TL;DR
The paper tackles enabling scalable learning over large-scale formal knowledge by encoding Coq's Calculus of Inductive Constructions as a rich semantic graph and pairing it with a dual text representation. It introduces The Tactician's Web platform, including a formal graph translation, meta-context (meta-graph) handling, tactic-based proofs, and extensive data-product tooling (datasets, APIs, and benchmarking). The evaluation demonstrates substantial graph-sharing-driven compression, mixed results for reconstruction via text vs graph representations, and competitive neural models (Graph2Tac) that leverage graph-informed features. The work emphasizes practical integration with Coq through prediction servers and proving clients, enabling massively parallel benchmarking and real-world proof automation. Overall, it provides a comprehensive framework for data-driven proof engineering, combining rigorous formal representations with scalable ML-friendly workflows.
Abstract
The Tactician's Web is a platform offering a large web of strongly interconnected, machine-checked, formal mathematical knowledge conveniently packaged for machine learning, analytics, and proof engineering. Built on top of the Coq proof assistant, the platform exports a dataset containing a wide variety of formal theories, presented as a web of definitions, theorems, proof terms, tactics, and proof states. Theories are encoded both as a semantic graph (rendered below) and as human-readable text, each with a unique set of advantages and disadvantages. Proving agents may interact with Coq through the same rich data representation and can be automatically benchmarked on a set of theorems. Tight integration with Coq provides the unique possibility to make agents available to proof engineers as practical tools.
