Table of Contents
Fetching ...

GenesisGeo: Technical Report

Minfeng Zhu, Zi Wang, Sizhe Ji, Zhengtong Du, Shengqiang Tai, Junming Ke, Xiao Deng, Zanlang Yin, Xiuqi Huang, Heyu Wang, Wei Chen

TL;DR

GenesisGeo tackles the gap between perceptual diagram cues and symbolic geometric reasoning. It introduces GenesisGeo-1M, a million-problem multimodal geometry dataset with diagram visuals and machine-checkable proof traces, and proposes a multi-task framework that jointly learns text-based proof generation and diagram-grounded proof generation. Building on this dataset, GenesisGeo-2B—a multimodal neuro-symbolic prover—uses a two-stage learning pipeline and an iterative propose-and-verify loop to integrate diagram understanding with symbolic deduction. Empirical results on IMO-30, IMO-95, and HAGeo-409 demonstrate substantial gains, highlighting the practical potential of large-scale multimodal supervision for connecting perceptual cues to verifiable geometric reasoning.

Abstract

Recent neuro-symbolic geometry theorem provers have made significant progress on Euclidean problems by coupling neural guidance with symbolic verification. However, most existing systems operate almost exclusively in a symbolic space, leaving diagram-based intuition largely unused during reasoning. For humans, geometric diagrams provide essential heuristics for identifying non-trivial auxiliary constructions. Meanwhile, visual language models (VLMs) still struggle with geometry due to the lack of high-quality data with geometric diagrams and reasoning supervision. In this paper, we introduce GenesisGeo-1M, a large-scale synthetic dataset for visual geometric reasoning that contains 1M multimodal geometry problems paired with machine-checkable proof traces. Building on this dataset, we formulate geometric learning as a multi-task training paradigm that jointly optimizes text-based proof generation and diagram-grounded proof generation, encouraging models to learn visual grounding and symbolic deduction. Extensive experiments show that our GenesisGeo-2B model achieves gold-medal-level performance on Olympiad geometry benchmarks, solving 29/30 problems on IMO-30, 63/95 on IMO-95, and 278/409 on HAGeo-409.

GenesisGeo: Technical Report

TL;DR

GenesisGeo tackles the gap between perceptual diagram cues and symbolic geometric reasoning. It introduces GenesisGeo-1M, a million-problem multimodal geometry dataset with diagram visuals and machine-checkable proof traces, and proposes a multi-task framework that jointly learns text-based proof generation and diagram-grounded proof generation. Building on this dataset, GenesisGeo-2B—a multimodal neuro-symbolic prover—uses a two-stage learning pipeline and an iterative propose-and-verify loop to integrate diagram understanding with symbolic deduction. Empirical results on IMO-30, IMO-95, and HAGeo-409 demonstrate substantial gains, highlighting the practical potential of large-scale multimodal supervision for connecting perceptual cues to verifiable geometric reasoning.

Abstract

Recent neuro-symbolic geometry theorem provers have made significant progress on Euclidean problems by coupling neural guidance with symbolic verification. However, most existing systems operate almost exclusively in a symbolic space, leaving diagram-based intuition largely unused during reasoning. For humans, geometric diagrams provide essential heuristics for identifying non-trivial auxiliary constructions. Meanwhile, visual language models (VLMs) still struggle with geometry due to the lack of high-quality data with geometric diagrams and reasoning supervision. In this paper, we introduce GenesisGeo-1M, a large-scale synthetic dataset for visual geometric reasoning that contains 1M multimodal geometry problems paired with machine-checkable proof traces. Building on this dataset, we formulate geometric learning as a multi-task training paradigm that jointly optimizes text-based proof generation and diagram-grounded proof generation, encouraging models to learn visual grounding and symbolic deduction. Extensive experiments show that our GenesisGeo-2B model achieves gold-medal-level performance on Olympiad geometry benchmarks, solving 29/30 problems on IMO-30, 63/95 on IMO-95, and 278/409 on HAGeo-409.

Paper Structure

This paper contains 20 sections, 35 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Data synthesis pipeline. We first sample a construction program in our DSL to instantiate a consistent geometric configuration. Then, our symbolic engine performs forward deduction from the construction predicates to obtain a closure of conclusions. From this closure, we select non-trivial targets via predicate-specific filtering and equivalence deduplication. For each selected goal, we trace back the proof to extract minimal premises and candidate auxiliary constructions, followed by a backward double-check that removes fake auxiliary points. Finally, we export each problem in a unified multimodal format, including the construction DSL and geometry diagram <img>, the predicate DSL <problem>, a refined auxiliary block <aux>, and a verifiable proof trace <proof>.
  • Figure 2: Proof length distribution with auxiliary point types.
  • Figure 3: An example of the synthetic data. The Problem in Construction section provides the problem description in the construction DSL, upon which the Image is rendered. The Problem section contains the problem translated into the predicate DSL, detailing the specific geometric properties of each point. The <aux> section specifies the necessary auxiliary constructions. Finally, the Proof block represents the entire deduction process: the <numerical_check> section contains numerically obtained information and the <proof> session presents the complete sequence of proof steps.
  • Figure 4: Multi-stage and multi-task learning framework. We first conduct proof-oriented continued pre-training on complete solution records to align visual diagrams with the predicate language. Then, we perform supervised multi-task fine-tuning for auxiliary construction, including text-based and vision-grounded settings, enabling robust proposals that integrate symbolic planning with visual grounding.
  • Figure 5: Proving Pipeline of VLM Prover.
  • ...and 2 more figures