Table of Contents
Fetching ...

Diagram Formalization Enhanced Multi-Modal Geometry Problem Solver

Zeren Zhang, Jo-Ku Cheng, Jingyang Deng, Lu Tian, Jinwen Ma, Ziran Qin, Xiaokai Zhang, Na Zhu, Tuo Leng

TL;DR

The Diagram Formalization Enhanced Geometry Problem Solver (DFE-GPS) is introduced, a new framework that integrates visual features, geometric formal language, and natural language representations that improves MLLMs’ ability to process geometric diagrams and extends their application to open-ended tasks on the formalgeo7k dataset.

Abstract

Mathematical reasoning remains an ongoing challenge for AI models, especially for geometry problems that require both linguistic and visual signals. As the vision encoders of most MLLMs are trained on natural scenes, they often struggle to understand geometric diagrams, performing no better in geometry problem solving than LLMs that only process text. This limitation is amplified by the lack of effective methods for representing geometric relationships. To address these issues, we introduce the Diagram Formalization Enhanced Geometry Problem Solver (DFE-GPS), a new framework that integrates visual features, geometric formal language, and natural language representations. We propose a novel synthetic data approach and create a large-scale geometric dataset, SynthGeo228K, annotated with both formal and natural language captions, designed to enhance the vision encoder for a better understanding of geometric structures. Our framework improves MLLMs' ability to process geometric diagrams and extends their application to open-ended tasks on the formalgeo7k dataset.

Diagram Formalization Enhanced Multi-Modal Geometry Problem Solver

TL;DR

The Diagram Formalization Enhanced Geometry Problem Solver (DFE-GPS) is introduced, a new framework that integrates visual features, geometric formal language, and natural language representations that improves MLLMs’ ability to process geometric diagrams and extends their application to open-ended tasks on the formalgeo7k dataset.

Abstract

Mathematical reasoning remains an ongoing challenge for AI models, especially for geometry problems that require both linguistic and visual signals. As the vision encoders of most MLLMs are trained on natural scenes, they often struggle to understand geometric diagrams, performing no better in geometry problem solving than LLMs that only process text. This limitation is amplified by the lack of effective methods for representing geometric relationships. To address these issues, we introduce the Diagram Formalization Enhanced Geometry Problem Solver (DFE-GPS), a new framework that integrates visual features, geometric formal language, and natural language representations. We propose a novel synthetic data approach and create a large-scale geometric dataset, SynthGeo228K, annotated with both formal and natural language captions, designed to enhance the vision encoder for a better understanding of geometric structures. Our framework improves MLLMs' ability to process geometric diagrams and extends their application to open-ended tasks on the formalgeo7k dataset.
Paper Structure (13 sections, 1 equation, 2 figures, 5 tables)

This paper contains 13 sections, 1 equation, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Comparative performance analysis of MLLMs: Impact of diagram integration
  • Figure 2: Our proposed geometric diagram generation pipeline (a), diagram formalization enhanced geometry problem solver (b) and process evaluation score (c). The three-stage training overview is illustrated in (d, e, f).