NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation
Weiming Wu, Jin Ye, Zi-kang Wang, Zhi Zhou, Yu-Feng Li, Lan-Zhe Guo
TL;DR
NeSyGeo tackles the data scarcity and misalignment challenges in multimodal geometric reasoning by introducing a neuro-symbolic data-generation framework. It defines a Geo-DSL for plane geometry, paired with a bidirectional conversion pipeline and a two-stage CoT generator (Reasoner and Verifier) to produce valid Q&A and reasoning paths, then maps symbolic outputs to images and text via Painter and Translator with information orthogonality. The approach yields 100k labeled samples across NeSyGeo-Caption and NeSyGeo-CoT, plus a 2,668-sample NeSyGeo-Test benchmark, and demonstrates consistent improvements across MathVision, MathVerse, and GeoQA under RL and SFT, including cases where a $4$-B model exceeds an $8$-B sibling on geometric tasks. Overall, NeSyGeo provides high-quality, diverse, and numerically grounded multimodal geometric data that strengthens visual grounding and cross-modal reasoning in MLLMs, with reproducibility and public dataset release enabling broader advancement in geometric reasoning research.
Abstract
Obtaining large-scale, high-quality reasoning data is crucial for improving the geometric reasoning capabilities of multi-modal large language models (MLLMs). However, existing data generation methods, whether based on predefined tem plates or constrained symbolic provers, inevitably face diversity and numerical generalization limitations. To address these limitations, we propose NeSyGeo, a novel neuro-symbolic framework for generating geometric reasoning data. First, we propose a domain-specific language grounded in the entity-attributes-relations paradigm to comprehensively represent all components of plane geometry, along with generative actions defined within this symbolic space. We then design a symbolic-visual-text pipeline that synthesizes symbolic sequences, maps them to visual and textual representations and generates reasoning path with reverse search and forward validation. Based on this framework, we construct NeSyGeo CoT and NeSyGeo-Caption datasets, containing 100k samples, and release a new benchmark NeSyGeo-Test for evaluating geometric reasoning abilities in MLLMs. Experiments demonstrate that the proposal significantly and consistently improves the performance of multiple MLLMs under both reinforcement and supervised fine-tuning. With only 4k samples and two epochs of reinforcement fine-tuning, base models achieve improvements of up to +15.8% on MathVision, +8.4% on MathVerse, and +7.3% on GeoQA. Notably, a 4B model can be improved to outperform an 8B model from the same series on geometric reasoning tasks.s
