Table of Contents
Fetching ...

A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

Yibo Yan, Jiamin Su, Jianxiang He, Fangteng Fu, Xu Zheng, Yuanhuiyi Lyu, Kun Wang, Shen Wang, Qingsong Wen, Xuming Hu

TL;DR

This survey maps the landscape of mathematical reasoning in the era of multimodal large language models, organizing progress into benchmarks, methodologies, and challenges. It analyzes 200+ works since 2021, highlighting transitions from text-only to multimodal reasoning, and delineates three methodological roles: Reasoner, Enhancer, and Planner, with a note on potential hybrid architectures. Key contributions include a taxonomy of benchmarks and evaluation metrics (including $ACC_{step}$, $ACC_{cate}$, PDR, and CHAIR), a synthesis of data strategies, and a comprehensive list of seven major challenges hindering progress toward robust, scalable Math-LLMs. The findings guide researchers toward improving data quality, visual reasoning, cross-domain generalization, and tool-driven planning, ultimately supporting more capable and trustworthy multimodal mathematical reasoning systems for education and scientific computation.

Abstract

Mathematical reasoning, a core aspect of human cognition, is vital across many domains, from educational problem-solving to scientific advancements. As artificial general intelligence (AGI) progresses, integrating large language models (LLMs) with mathematical reasoning tasks is becoming increasingly significant. This survey provides the first comprehensive analysis of mathematical reasoning in the era of multimodal large language models (MLLMs). We review over 200 studies published since 2021, and examine the state-of-the-art developments in Math-LLMs, with a focus on multimodal settings. We categorize the field into three dimensions: benchmarks, methodologies, and challenges. In particular, we explore multimodal mathematical reasoning pipeline, as well as the role of (M)LLMs and the associated methodologies. Finally, we identify five major challenges hindering the realization of AGI in this domain, offering insights into the future direction for enhancing multimodal reasoning capabilities. This survey serves as a critical resource for the research community in advancing the capabilities of LLMs to tackle complex multimodal reasoning tasks.

A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

TL;DR

This survey maps the landscape of mathematical reasoning in the era of multimodal large language models, organizing progress into benchmarks, methodologies, and challenges. It analyzes 200+ works since 2021, highlighting transitions from text-only to multimodal reasoning, and delineates three methodological roles: Reasoner, Enhancer, and Planner, with a note on potential hybrid architectures. Key contributions include a taxonomy of benchmarks and evaluation metrics (including , , PDR, and CHAIR), a synthesis of data strategies, and a comprehensive list of seven major challenges hindering progress toward robust, scalable Math-LLMs. The findings guide researchers toward improving data quality, visual reasoning, cross-domain generalization, and tool-driven planning, ultimately supporting more capable and trustworthy multimodal mathematical reasoning systems for education and scientific computation.

Abstract

Mathematical reasoning, a core aspect of human cognition, is vital across many domains, from educational problem-solving to scientific advancements. As artificial general intelligence (AGI) progresses, integrating large language models (LLMs) with mathematical reasoning tasks is becoming increasingly significant. This survey provides the first comprehensive analysis of mathematical reasoning in the era of multimodal large language models (MLLMs). We review over 200 studies published since 2021, and examine the state-of-the-art developments in Math-LLMs, with a focus on multimodal settings. We categorize the field into three dimensions: benchmarks, methodologies, and challenges. In particular, we explore multimodal mathematical reasoning pipeline, as well as the role of (M)LLMs and the associated methodologies. Finally, we identify five major challenges hindering the realization of AGI in this domain, offering insights into the future direction for enhancing multimodal reasoning capabilities. This survey serves as a critical resource for the research community in advancing the capabilities of LLMs to tackle complex multimodal reasoning tasks.

Paper Structure

This paper contains 16 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The illustration of our research scope (i.e., investigating the MLLM's math reasoning capability).
  • Figure 2: The release timeline of Math-LLMs in recent years.
  • Figure 3: Typical data format of math reasoning task for text-only & multimodal settings. Examples are derived from MathVerse zhang2025mathverse, which assess whether and how much MLLMs can truly understand the visual diagrams for mathematical reasoning.
  • Figure 4: The illustration of the comparisons among three paradigms of (M)LLM-based mathematical reasoning.