Table of Contents
Fetching ...

Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration

Yicheng Pan, Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Jianshu Zhang, Quan Liu, Jianqing Gao, Feng Ma

TL;DR

The paper tackles geometry problem solving with multimodal LLMs, addressing the scarcity of high-quality step-by-step data and hallucinations during reasoning. It introduces GeoGen, a symbolic-neural pipeline that automatically generates large-scale, multi-step CoT data from geometry diagrams, and GeoLogic, a bridging model that translates natural language reasoning into formal geometric logic for symbolic verification. By constructing GeoExpand and GeoSynth datasets and training an MLLM with GeoGen data, then applying step-level tree search with GeoLogic during inference, the approach achieves state-of-the-art results on several geometry benchmarks and consistently reduces hallucinations. The work demonstrates a scalable framework that leverages symbolic reasoning to augment neural models, improving interpretability and reliability in geometric problem solving with potential for broader symbolic-neural integrations.

Abstract

Recent advances in Multimodal Large Language Models (MLLMs) have achieved remarkable progress in general domains and demonstrated promise in multimodal mathematical reasoning. However, applying MLLMs to geometry problem solving (GPS) remains challenging due to lack of accurate step-by-step solution data and severe hallucinations during reasoning. In this paper, we propose GeoGen, a pipeline that can automatically generates step-wise reasoning paths for geometry diagrams. By leveraging the precise symbolic reasoning, \textbf{GeoGen} produces large-scale, high-quality question-answer pairs. To further enhance the logical reasoning ability of MLLMs, we train \textbf{GeoLogic}, a Large Language Model (LLM) using synthetic data generated by GeoGen. Serving as a bridge between natural language and symbolic systems, GeoLogic enables symbolic tools to help verifying MLLM outputs, making the reasoning process more rigorous and alleviating hallucinations. Experimental results show that our approach consistently improves the performance of MLLMs, achieving remarkable results on benchmarks for geometric reasoning tasks. This improvement stems from our integration of the strengths of LLMs and symbolic systems, which enables a more reliable and interpretable approach for the GPS task. Codes are available at https://github.com/ycpNotFound/GeoGen.

Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration

TL;DR

The paper tackles geometry problem solving with multimodal LLMs, addressing the scarcity of high-quality step-by-step data and hallucinations during reasoning. It introduces GeoGen, a symbolic-neural pipeline that automatically generates large-scale, multi-step CoT data from geometry diagrams, and GeoLogic, a bridging model that translates natural language reasoning into formal geometric logic for symbolic verification. By constructing GeoExpand and GeoSynth datasets and training an MLLM with GeoGen data, then applying step-level tree search with GeoLogic during inference, the approach achieves state-of-the-art results on several geometry benchmarks and consistently reduces hallucinations. The work demonstrates a scalable framework that leverages symbolic reasoning to augment neural models, improving interpretability and reliability in geometric problem solving with potential for broader symbolic-neural integrations.

Abstract

Recent advances in Multimodal Large Language Models (MLLMs) have achieved remarkable progress in general domains and demonstrated promise in multimodal mathematical reasoning. However, applying MLLMs to geometry problem solving (GPS) remains challenging due to lack of accurate step-by-step solution data and severe hallucinations during reasoning. In this paper, we propose GeoGen, a pipeline that can automatically generates step-wise reasoning paths for geometry diagrams. By leveraging the precise symbolic reasoning, \textbf{GeoGen} produces large-scale, high-quality question-answer pairs. To further enhance the logical reasoning ability of MLLMs, we train \textbf{GeoLogic}, a Large Language Model (LLM) using synthetic data generated by GeoGen. Serving as a bridge between natural language and symbolic systems, GeoLogic enables symbolic tools to help verifying MLLM outputs, making the reasoning process more rigorous and alleviating hallucinations. Experimental results show that our approach consistently improves the performance of MLLMs, achieving remarkable results on benchmarks for geometric reasoning tasks. This improvement stems from our integration of the strengths of LLMs and symbolic systems, which enables a more reliable and interpretable approach for the GPS task. Codes are available at https://github.com/ycpNotFound/GeoGen.

Paper Structure

This paper contains 22 sections, 1 equation, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Framework of our proposed GeoGen pipeline.
  • Figure 2: We conduct ablation study to examine the impact of different training data compositions. This figure shows the performance on GeoTest across epochs for each configuration. T0-T3 settings are defined in Table \ref{['tab:geo_ablation_2']}.
  • Figure 3: Accuracy trends as we vary the search width during symbolic reasoning in the inference stage. We adopt the same evaluation metrics as before.
  • Figure 4: A typical case with model predictions improving as our methods are progressively integrated.