GenesisGeo: Technical Report
Minfeng Zhu, Zi Wang, Sizhe Ji, Zhengtong Du, Shengqiang Tai, Junming Ke, Xiao Deng, Zanlang Yin, Xiuqi Huang, Heyu Wang, Wei Chen
TL;DR
GenesisGeo tackles the gap between perceptual diagram cues and symbolic geometric reasoning. It introduces GenesisGeo-1M, a million-problem multimodal geometry dataset with diagram visuals and machine-checkable proof traces, and proposes a multi-task framework that jointly learns text-based proof generation and diagram-grounded proof generation. Building on this dataset, GenesisGeo-2B—a multimodal neuro-symbolic prover—uses a two-stage learning pipeline and an iterative propose-and-verify loop to integrate diagram understanding with symbolic deduction. Empirical results on IMO-30, IMO-95, and HAGeo-409 demonstrate substantial gains, highlighting the practical potential of large-scale multimodal supervision for connecting perceptual cues to verifiable geometric reasoning.
Abstract
Recent neuro-symbolic geometry theorem provers have made significant progress on Euclidean problems by coupling neural guidance with symbolic verification. However, most existing systems operate almost exclusively in a symbolic space, leaving diagram-based intuition largely unused during reasoning. For humans, geometric diagrams provide essential heuristics for identifying non-trivial auxiliary constructions. Meanwhile, visual language models (VLMs) still struggle with geometry due to the lack of high-quality data with geometric diagrams and reasoning supervision. In this paper, we introduce GenesisGeo-1M, a large-scale synthetic dataset for visual geometric reasoning that contains 1M multimodal geometry problems paired with machine-checkable proof traces. Building on this dataset, we formulate geometric learning as a multi-task training paradigm that jointly optimizes text-based proof generation and diagram-grounded proof generation, encouraging models to learn visual grounding and symbolic deduction. Extensive experiments show that our GenesisGeo-2B model achieves gold-medal-level performance on Olympiad geometry benchmarks, solving 29/30 problems on IMO-30, 63/95 on IMO-95, and 278/409 on HAGeo-409.
