Table of Contents
Fetching ...

GeoFM: Enhancing Geometric Reasoning of MLLMs via Synthetic Data Generation through Formal Language

Yuhao Zhang, Dingxin Hu, Tinghao Yu, Hao Liu, Yiting Liu

TL;DR

GeoFM tackles the data shortage for geometric reasoning in multi-modal LLMs by automating geometric data synthesis with formal languages and a symbolic engine. It formalizes seed problems in FormalGeo, generates new problems by combining metric conditions, and creates matched natural-language instructions and high-fidelity diagrams via an automated pipeline (including a GMBL-based diagram generator). The authors introduce GeoFM80K and show that training on GeoFM data yields substantial gains over strong baselines, including GPT-4o, across MathVista GPS and GeoQA, with notable improvements when augmenting existing open-source datasets. The work also demonstrates robustness to distribution shifts and provides a scalable approach that outperforms rule-based synthetic datasets like MAVIS-Geometry in multiple settings. Overall, GeoFM advances geometric reasoning for MLLMs and offers a practical route to large-scale, high-quality geometric data generation.

Abstract

Multi-modal Large Language Models (MLLMs) have gained significant attention in both academia and industry for their capabilities in handling multi-modal tasks. However, these models face challenges in mathematical geometric reasoning due to the scarcity of high-quality geometric data. To address this issue, synthetic geometric data has become an essential strategy. Current methods for generating synthetic geometric data involve rephrasing or expanding existing problems and utilizing predefined rules and templates to create geometric images and problems. However, these approaches often produce data that lacks diversity or is prone to noise. Additionally, the geometric images synthesized by existing methods tend to exhibit limited variation and deviate significantly from authentic geometric diagrams. To overcome these limitations, we propose GeoFM, a novel method for synthesizing geometric data. GeoFM uses formal languages to explore combinations of conditions within metric space, generating high-fidelity geometric problems that differ from the originals while ensuring correctness through a symbolic engine. Experimental results show that our synthetic data significantly outperforms existing methods. The model trained with our data surpass the proprietary GPT-4o model by 18.7\% on geometry problem-solving tasks in MathVista and by 16.5\% on GeoQA. Additionally, it exceeds the performance of a leading open-source model by 5.7\% on MathVista and by 2.7\% on GeoQA.

GeoFM: Enhancing Geometric Reasoning of MLLMs via Synthetic Data Generation through Formal Language

TL;DR

GeoFM tackles the data shortage for geometric reasoning in multi-modal LLMs by automating geometric data synthesis with formal languages and a symbolic engine. It formalizes seed problems in FormalGeo, generates new problems by combining metric conditions, and creates matched natural-language instructions and high-fidelity diagrams via an automated pipeline (including a GMBL-based diagram generator). The authors introduce GeoFM80K and show that training on GeoFM data yields substantial gains over strong baselines, including GPT-4o, across MathVista GPS and GeoQA, with notable improvements when augmenting existing open-source datasets. The work also demonstrates robustness to distribution shifts and provides a scalable approach that outperforms rule-based synthetic datasets like MAVIS-Geometry in multiple settings. Overall, GeoFM advances geometric reasoning for MLLMs and offers a practical route to large-scale, high-quality geometric data generation.

Abstract

Multi-modal Large Language Models (MLLMs) have gained significant attention in both academia and industry for their capabilities in handling multi-modal tasks. However, these models face challenges in mathematical geometric reasoning due to the scarcity of high-quality geometric data. To address this issue, synthetic geometric data has become an essential strategy. Current methods for generating synthetic geometric data involve rephrasing or expanding existing problems and utilizing predefined rules and templates to create geometric images and problems. However, these approaches often produce data that lacks diversity or is prone to noise. Additionally, the geometric images synthesized by existing methods tend to exhibit limited variation and deviate significantly from authentic geometric diagrams. To overcome these limitations, we propose GeoFM, a novel method for synthesizing geometric data. GeoFM uses formal languages to explore combinations of conditions within metric space, generating high-fidelity geometric problems that differ from the originals while ensuring correctness through a symbolic engine. Experimental results show that our synthetic data significantly outperforms existing methods. The model trained with our data surpass the proprietary GPT-4o model by 18.7\% on geometry problem-solving tasks in MathVista and by 16.5\% on GeoQA. Additionally, it exceeds the performance of a leading open-source model by 5.7\% on MathVista and by 2.7\% on GeoQA.

Paper Structure

This paper contains 26 sections, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison of different methods for synthesizing geometric data. (a) Generate geometric Q&A data by using MLLMs to rephrase existing problems or create new Q&A from collected geometric images. (b) Utilize a rule-based data engine to generate template-based Q&A and low-fidelity images. (c) Employ formal language to explore the combinations of geometric metric conditions and synthesize new problems, ensuring solution accuracy through symbolic reasoning, and generate high-fidelity geometric images.
  • Figure 2: The Framework of Geometric Data Synthesis GeoFM
  • Figure 3: Comparison with existing geometric synthesis data at different data scales using LLaVA-NeXT-8B. The baseline corresponds to the performance of the original model.
  • Figure 4: Demonstration of geometric problem solving using GPT-4o and GeoFM-8B
  • Figure 5: Convert a synthesized formal language geometric problem into natural language instruction data
  • ...and 2 more figures