Table of Contents
Fetching ...

Autoformalizing Euclidean Geometry

Logan Murphy, Kaiyu Yang, Jialiang Sun, Zhaoyu Li, Anima Anandkumar, Xujie Si

TL;DR

This work targets autoformalization within Euclidean geometry by introducing a neuro-symbolic framework that blends domain knowledge, an SMT-based symbolic engine, and large language models, instantiated in LeanEuclid to formalize theorems and their diagrammatic reasoning. LeanEuclid combines 173 manually formalized examples (from Euclid's Elements and UniGeo) with a formal system E to model implicit diagrammatic steps, enabling automatic filling of gaps via SMT and automatic semantic evaluation of theorem statements. An automatic evaluation pipeline, E3, checks logical and approximate equivalence between autoformalized and ground-truth statements, revealing both capabilities and limits of current LLMs (GPT-4 and GPT-4V) in this domain. The results show that a nontrivial fraction of theorems and proofs can be autoformalized, but substantial gaps remain, underscoring the value of automatic semantic evaluation and pointing to LeanEuclid as a useful benchmark for advancing autoformalization research in mathematics.

Abstract

Autoformalization involves automatically translating informal math into formal theorems and proofs that are machine-verifiable. Euclidean geometry provides an interesting and controllable domain for studying autoformalization. In this paper, we introduce a neuro-symbolic framework for autoformalizing Euclidean geometry, which combines domain knowledge, SMT solvers, and large language models (LLMs). One challenge in Euclidean geometry is that informal proofs rely on diagrams, leaving gaps in texts that are hard to formalize. To address this issue, we use theorem provers to fill in such diagrammatic information automatically, so that the LLM only needs to autoformalize the explicit textual steps, making it easier for the model. We also provide automatic semantic evaluation for autoformalized theorem statements. We construct LeanEuclid, an autoformalization benchmark consisting of problems from Euclid's Elements and the UniGeo dataset formalized in the Lean proof assistant. Experiments with GPT-4 and GPT-4V show the capability and limitations of state-of-the-art LLMs on autoformalizing geometry problems. The data and code are available at https://github.com/loganrjmurphy/LeanEuclid.

Autoformalizing Euclidean Geometry

TL;DR

This work targets autoformalization within Euclidean geometry by introducing a neuro-symbolic framework that blends domain knowledge, an SMT-based symbolic engine, and large language models, instantiated in LeanEuclid to formalize theorems and their diagrammatic reasoning. LeanEuclid combines 173 manually formalized examples (from Euclid's Elements and UniGeo) with a formal system E to model implicit diagrammatic steps, enabling automatic filling of gaps via SMT and automatic semantic evaluation of theorem statements. An automatic evaluation pipeline, E3, checks logical and approximate equivalence between autoformalized and ground-truth statements, revealing both capabilities and limits of current LLMs (GPT-4 and GPT-4V) in this domain. The results show that a nontrivial fraction of theorems and proofs can be autoformalized, but substantial gaps remain, underscoring the value of automatic semantic evaluation and pointing to LeanEuclid as a useful benchmark for advancing autoformalization research in mathematics.

Abstract

Autoformalization involves automatically translating informal math into formal theorems and proofs that are machine-verifiable. Euclidean geometry provides an interesting and controllable domain for studying autoformalization. In this paper, we introduce a neuro-symbolic framework for autoformalizing Euclidean geometry, which combines domain knowledge, SMT solvers, and large language models (LLMs). One challenge in Euclidean geometry is that informal proofs rely on diagrams, leaving gaps in texts that are hard to formalize. To address this issue, we use theorem provers to fill in such diagrammatic information automatically, so that the LLM only needs to autoformalize the explicit textual steps, making it easier for the model. We also provide automatic semantic evaluation for autoformalized theorem statements. We construct LeanEuclid, an autoformalization benchmark consisting of problems from Euclid's Elements and the UniGeo dataset formalized in the Lean proof assistant. Experiments with GPT-4 and GPT-4V show the capability and limitations of state-of-the-art LLMs on autoformalizing geometry problems. The data and code are available at https://github.com/loganrjmurphy/LeanEuclid.
Paper Structure (81 sections, 4 equations, 11 figures, 2 tables)

This paper contains 81 sections, 4 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Left: Proposition 1 in Euclid's Elements (Book I). The orange text involves diagrammatic reasoning: Euclid did not explicitly prove the two circles actually intersect, but the reader can use the diagram to implicitly fill in the logical gap. Top right: The model autoformalizes the problem into a formal theorem (proposition_1'), which is evaluated by checking its logical equivalence with the ground truth (proposition_1), leveraging domain knowledge and a symbolic automated reasoning engine based on SMT (satisfiability modulo theories) solvers. Bottom right: A proof autoformalized by the model. Like Euclid's proofs, it does not need to handle diagrammatic reasoning explicitly. Lean can check the proof to identify a list of diagrammatic reasoning gaps, e.g., "intersects BCD ACE". Then, it attempts to fill in all gaps automatically using the symbolic reasoning engine based on SMT solvers.
  • Figure 2: Three steps in the proof in Fig. \ref{['fig:overall']}. Left: Geometric objects and facts. Right: Rules applied to construct new objects and deduce new facts. Each rule has inbound arrows from its preconditions and outbound arrows to its postconditions. Dashed lines: When applying a rule with missing preconditions, we try to fill the gap using a symbolic reasoning engine based on SMT solvers. These implicit steps performed by the symbolic engine can only use non-construction rules, whereas explicit steps performed by humans (or machine learning models) can use any rules.
  • Figure 3: Approximate equivalence checking results for theorems from Elements formalized by GPT-4 (5-shots).
  • Figure :
  • Figure A: The case in Proposition 24 missed by Euclid.
  • ...and 6 more figures