Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions
Boyan Duan, Xiao Liang, Shuai Lu, Yaoxiang Wang, Yelong Shen, Kai-Wei Chang, Ying Nian Wu, Mao Yang, Weizhu Chen, Yeyun Gong
TL;DR
The paper addresses the challenge of IMO-level automated geometry theorem proving by presenting HAGeo, a CPU-only, neural-free framework that uses efficient deduction (DD) and algebraic reasoning (AR) augmented by heuristic auxiliary constructions. It demonstrates gold-medal performance on the IMO-30 benchmark and achieves substantial speedups over neural approaches, while introducing HAGeo-409, a larger, more challenging benchmark with human-difficulty annotations. Key contributions include an optimized geometry-specific language, a faster DDAR engine, and a principled auxiliary-point heuristic that yields robust performance across problem difficulty. These advances offer a scalable, GPU-free path for geometry theorem proving and provide a rigorous benchmark to drive future improvements in the field.
Abstract
Automated theorem proving in Euclidean geometry, particularly for International Mathematical Olympiad (IMO) level problems, remains a major challenge and an important research focus in Artificial Intelligence. In this paper, we present a highly efficient method for geometry theorem proving that runs entirely on CPUs without relying on neural network-based inference. Our initial study shows that a simple random strategy for adding auxiliary points can achieve silver-medal level human performance on IMO. Building on this, we propose HAGeo, a Heuristic-based method for adding Auxiliary constructions in Geometric deduction that solves 28 of 30 problems on the IMO-30 benchmark, achieving gold-medal level performance and surpassing AlphaGeometry, a competitive neural network-based approach, by a notable margin. To evaluate our method and existing approaches more comprehensively, we further construct HAGeo-409, a benchmark consisting of 409 geometry problems with human-assessed difficulty levels. Compared with the widely used IMO-30, our benchmark poses greater challenges and provides a more precise evaluation, setting a higher bar for geometry theorem proving.
