Table of Contents
Fetching ...

Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information

Junbo Zhao, Ting Zhang, Jiayu Sun, Mi Tian, Hua Huang

TL;DR

Pi-GPS tackles the problem of textual ambiguity in geometry problem solving by exploiting diagrammatic information through a rectifier and a verifier that interact with a diagram parser. The rectifier uses Multimodal LLMs to disambiguate text conditioned on the diagram, while the verifier checks outputs against diagram heuristics to reduce hallucinations. A theorem-predictor based on an advanced LLM guides the reasoning with an expanded geometry theorem library, and a solver applies the theorems to produce explanations. Empirically, Pi-GPS achieves state-of-the-art performance on Geometry3K and PGPS9K, with nearly 10% gains over prior neural-symbolic approaches, highlighting the importance of resolving text ambiguity in multimodal mathematical reasoning.

Abstract

Geometry problem solving has garnered increasing attention due to its potential applications in intelligent education field. Inspired by the observation that text often introduces ambiguities that diagrams can clarify, this paper presents Pi-GPS, a novel framework that unleashes the power of diagrammatic information to resolve textual ambiguities, an aspect largely overlooked in prior research. Specifically, we design a micro module comprising a rectifier and verifier: the rectifier employs MLLMs to disambiguate text based on the diagrammatic context, while the verifier ensures the rectified output adherence to geometric rules, mitigating model hallucinations. Additionally, we explore the impact of LLMs in theorem predictor based on the disambiguated formal language. Empirical results demonstrate that Pi-GPS surpasses state-of-the-art models, achieving a nearly 10\% improvement on Geometry3K over prior neural-symbolic approaches. We hope this work highlights the significance of resolving textual ambiguity in multimodal mathematical reasoning, a crucial factor limiting performance.

Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information

TL;DR

Pi-GPS tackles the problem of textual ambiguity in geometry problem solving by exploiting diagrammatic information through a rectifier and a verifier that interact with a diagram parser. The rectifier uses Multimodal LLMs to disambiguate text conditioned on the diagram, while the verifier checks outputs against diagram heuristics to reduce hallucinations. A theorem-predictor based on an advanced LLM guides the reasoning with an expanded geometry theorem library, and a solver applies the theorems to produce explanations. Empirically, Pi-GPS achieves state-of-the-art performance on Geometry3K and PGPS9K, with nearly 10% gains over prior neural-symbolic approaches, highlighting the importance of resolving text ambiguity in multimodal mathematical reasoning.

Abstract

Geometry problem solving has garnered increasing attention due to its potential applications in intelligent education field. Inspired by the observation that text often introduces ambiguities that diagrams can clarify, this paper presents Pi-GPS, a novel framework that unleashes the power of diagrammatic information to resolve textual ambiguities, an aspect largely overlooked in prior research. Specifically, we design a micro module comprising a rectifier and verifier: the rectifier employs MLLMs to disambiguate text based on the diagrammatic context, while the verifier ensures the rectified output adherence to geometric rules, mitigating model hallucinations. Additionally, we explore the impact of LLMs in theorem predictor based on the disambiguated formal language. Empirical results demonstrate that Pi-GPS surpasses state-of-the-art models, achieving a nearly 10\% improvement on Geometry3K over prior neural-symbolic approaches. We hope this work highlights the significance of resolving textual ambiguity in multimodal mathematical reasoning, a crucial factor limiting performance.

Paper Structure

This paper contains 11 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Illustrating the ambiguity presented in text. Text alone offers insufficient information to resolve the ambiguity, and disambiguation becomes straightforward when supported by a diagram.
  • Figure 2: Illustrating the pipeline of our Pi-GPS: the overall framework is shown on the left and the text disambiguation module is depicted on the right, which plays a crucial role in resolving text ambiguity, enhancing performance. REX stands for regex pattern matching.
  • Figure 3: Illustrating several examples, showing the proposed text disambiguation module is capable of resolving text ambiguity.
  • Figure 4: Illustrating the effect of different MLLMs used in rectifier within text disambiguation module.
  • Figure 5: Illustrating the limitations in current GPS framework.