Table of Contents
Fetching ...

Geo-Code: A Code Framework for Reverse Code Generation from Geometric Images Based on Two-Stage Multi-Agent Evolution

Zhenyu Wu, Yanxi Long, Jian Li, Hua Huang

TL;DR

Geo-code introduces a two-stage multi-agent evolution framework that converts geometric images into executable code. Phase I grounds geometry via Pixel-wise Anchoring and semantic verification, while Phase II uses a synthesis-rendering-validation loop with Visual Error Projection to iteratively refine the code. The approach achieves state-of-the-art geometric fidelity and visual consistency, and preserves or enhances downstream multimodal reasoning performance. The authors release a rigorously verified Geo-code Dataset (~1,510 samples) and a fine-tuned GeoCodeLM model, providing strong foundations for future research in inverse graphics and geometry-based visual reasoning.

Abstract

Program code serves as a bridge linking vision and logic, providing a feasible supervisory approach for enhancing the multimodal reasoning capability of large models through geometric operations such as auxiliary line construction and perspective transformation. Nevertheless, current inverse graphics methods face tremendous challenges in accurately reconstructing complex geometric details, which often results in the loss of key geometric constraints or structural distortion. To address this bottleneck, we propose Geo-coder -- the first inverse programming framework for geometric images based on a multi-agent system. Our method innovatively decouples the process into geometric modeling via pixel-wise anchoring and metric-driven code evolution: Stage 1 leverages the complementary advantages of visual operators and large models to achieve precise capture of pixel coordinates and visual attributes; Stage 2 introduces a synthesis-rendering-validation closed loop, where bidirectional visual feedback drives the self-correction of code. Extensive experiments demonstrate that Geo-coder achieves a substantial lead in both geometric reconstruction accuracy and visual consistency. Notably, by effectively preserving the core geometric semantics, the images reconstructed with our method exhibit equivalent performance to the original ones in multimodal reasoning tasks, which fully validates the robustness of the framework. Finally, to further reduce research costs, we have open-sourced the Geo-coder dataset constructed on the GeoCode framework, which contains more than 1,500 samples. On this basis, we have also open-sourced the GeocodeLM model, laying a solid data and model foundation for subsequent research in this field.

Geo-Code: A Code Framework for Reverse Code Generation from Geometric Images Based on Two-Stage Multi-Agent Evolution

TL;DR

Geo-code introduces a two-stage multi-agent evolution framework that converts geometric images into executable code. Phase I grounds geometry via Pixel-wise Anchoring and semantic verification, while Phase II uses a synthesis-rendering-validation loop with Visual Error Projection to iteratively refine the code. The approach achieves state-of-the-art geometric fidelity and visual consistency, and preserves or enhances downstream multimodal reasoning performance. The authors release a rigorously verified Geo-code Dataset (~1,510 samples) and a fine-tuned GeoCodeLM model, providing strong foundations for future research in inverse graphics and geometry-based visual reasoning.

Abstract

Program code serves as a bridge linking vision and logic, providing a feasible supervisory approach for enhancing the multimodal reasoning capability of large models through geometric operations such as auxiliary line construction and perspective transformation. Nevertheless, current inverse graphics methods face tremendous challenges in accurately reconstructing complex geometric details, which often results in the loss of key geometric constraints or structural distortion. To address this bottleneck, we propose Geo-coder -- the first inverse programming framework for geometric images based on a multi-agent system. Our method innovatively decouples the process into geometric modeling via pixel-wise anchoring and metric-driven code evolution: Stage 1 leverages the complementary advantages of visual operators and large models to achieve precise capture of pixel coordinates and visual attributes; Stage 2 introduces a synthesis-rendering-validation closed loop, where bidirectional visual feedback drives the self-correction of code. Extensive experiments demonstrate that Geo-coder achieves a substantial lead in both geometric reconstruction accuracy and visual consistency. Notably, by effectively preserving the core geometric semantics, the images reconstructed with our method exhibit equivalent performance to the original ones in multimodal reasoning tasks, which fully validates the robustness of the framework. Finally, to further reduce research costs, we have open-sourced the Geo-coder dataset constructed on the GeoCode framework, which contains more than 1,500 samples. On this basis, we have also open-sourced the GeocodeLM model, laying a solid data and model foundation for subsequent research in this field.
Paper Structure (34 sections, 6 equations, 7 figures, 3 tables)

This paper contains 34 sections, 6 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Comparison plot of different reconstruction methods. For more examples, refer to the Appendix.
  • Figure 2: The Geo-code framework operates in two strategic phases. Phase 1 (Geometric Modeling via Anchoring) employs the Geometric Extraction Agent (1.2) to derive geometric attributes via Pixel-wise Anchoring, followed by the Visual Verification Agent (1.3) to synthesize the skeleton. Phase 2 translates this skeleton into executable code, utilizing the Hybrid Inspection and Reflective Correction Agents (2.2) to iteratively refine the output using the Code Evolution via Visual Error Projection mechanism for precise alignment.
  • Figure 3: Detailed inference accuracy for each individual dataset across four benchmarks.
  • Figure 4: Comparison of Reconstructed Images (GeoQA No. 157)
  • Figure 5: Reconstructed Images Comparison (GeoQA No.266)
  • ...and 2 more figures