Table of Contents
Fetching ...

Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning

Linger Deng, Linghao Zhu, Yuliang Liu, Yu Wang, Qunyi Xie, Jingjing Wu, Gang Zhang, Yingying Zhu, Xiang Bai

TL;DR

The paper tackles the gap in geometric reasoning for large multimodal models caused by geometry-poor training data. It introduces TR-CoT, a two-stage framework with TR-Engine for theorem-grounded image/description synthesis and TR-Reasoner for reverse, stepwise verification of Q&A, producing diverse, theorem-aware data. The resulting TR-GeoMM and TR-GeoSup datasets, along with extensive ablations and cross-dataset evaluations, show significant improvements in geometric problem solving, outperforming several baselines including GPT-4o on MathVista and GeoQA. The work demonstrates that explicit, theorem-driven reasoning and reverse validation can substantially enhance structured geometric CoT, with potential applicability to other mathematical domains.

Abstract

Large Multimodal Models (LMMs) face limitations in geometric reasoning due to insufficient Chain of Thought (CoT) image-text training data. While existing approaches leverage template-based or LLM-assisted methods for geometric CoT data creation, they often face challenges in achieving both diversity and precision. To bridge this gap, we introduce a two-stage Theorem-Validated Reverse Chain-of-Thought Reasoning Synthesis (TR-CoT) framework. The first stage, TR-Engine, synthesizes theorem-grounded geometric diagrams with structured descriptions and properties. The second stage, TR-Reasoner, employs reverse reasoning to iteratively refine question-answer pairs by cross-validating geometric properties and description fragments. Our approach expands theorem-type coverage, corrects long-standing misunderstandings, and enhances geometric reasoning. Fine-grained CoT improves theorem understanding and increases logical consistency by 24.5%. Our best models surpass the baselines in MathVista and GeoQA by 10.1% and 4.7%, outperforming advanced closed-source models like GPT-4o.

Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning

TL;DR

The paper tackles the gap in geometric reasoning for large multimodal models caused by geometry-poor training data. It introduces TR-CoT, a two-stage framework with TR-Engine for theorem-grounded image/description synthesis and TR-Reasoner for reverse, stepwise verification of Q&A, producing diverse, theorem-aware data. The resulting TR-GeoMM and TR-GeoSup datasets, along with extensive ablations and cross-dataset evaluations, show significant improvements in geometric problem solving, outperforming several baselines including GPT-4o on MathVista and GeoQA. The work demonstrates that explicit, theorem-driven reasoning and reverse validation can substantially enhance structured geometric CoT, with potential applicability to other mathematical domains.

Abstract

Large Multimodal Models (LMMs) face limitations in geometric reasoning due to insufficient Chain of Thought (CoT) image-text training data. While existing approaches leverage template-based or LLM-assisted methods for geometric CoT data creation, they often face challenges in achieving both diversity and precision. To bridge this gap, we introduce a two-stage Theorem-Validated Reverse Chain-of-Thought Reasoning Synthesis (TR-CoT) framework. The first stage, TR-Engine, synthesizes theorem-grounded geometric diagrams with structured descriptions and properties. The second stage, TR-Reasoner, employs reverse reasoning to iteratively refine question-answer pairs by cross-validating geometric properties and description fragments. Our approach expands theorem-type coverage, corrects long-standing misunderstandings, and enhances geometric reasoning. Fine-grained CoT improves theorem understanding and increases logical consistency by 24.5%. Our best models surpass the baselines in MathVista and GeoQA by 10.1% and 4.7%, outperforming advanced closed-source models like GPT-4o.

Paper Structure

This paper contains 33 sections, 2 equations, 16 figures, 9 tables, 1 algorithm.

Figures (16)

  • Figure 1: Comparison of TR-CoT with existing CoT data generation approaches. (a) Rephrase existing Q&A pairs using LLMs, relying on existing CoT data. (b) Generate images and CoT data using pre-defined templates containing a limited number of theorems. (c) Generate CoT using LMMs, where accuracy is limited by the performance of the LMMs. (d) Design the TR-Engine to generate images, corresponding descriptions, and geometric properties from theorems. And input the descriptions and properties into TR-Reasoner to generate reliable CoT Q&A pairs.
  • Figure 2: The TR-Engine generates diverse images, corresponding descriptions, and geometric properties step by step based on geometric theorems. Subsequently, the TR-Reasoner is utilized to obtain accurate geometric Q&A pairs from descriptions and properties.
  • Figure 3: Overview of the TR-Engine. Starting from a Geometric Substrate Library, dynamically injecting elements based on theorems, and integrating a property computation module to enable multi-step geometric reasoning and validation in image generation.
  • Figure 4: Overview of the TR-Reasoner. Image descriptions are segmented into patches to generate single-step reasoning results. Single-step reasoning results are fused progressively to get multi-step reasoning results. Then questions are generated based on the multi-step reasoning results. Finally, Q&A pairs that contradict geometric properties are filtered.
  • Figure 5: Diversity analysis of TR-GeoMM.
  • ...and 11 more figures