Table of Contents
Fetching ...

CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

Situo Zhang, Yifan Zhang, Zichen Zhu, Da Ma, Lei Pan, Danyang Zhang, Zihan Zhao, Lu Chen, Kai Yu

Abstract

Charts are ubiquitous in scientific and financial literature for presenting structured data. However, chart reasoning remains challenging for multimodal large language models (MLLMs) due to the lack of high-quality training data, as well as the need for fine-grained visual grounding and precise numerical computation. To address these challenges, we first propose DuoChart, a scalable dual-source data pipeline that combines synthesized charts with real-world charts to construct diverse, high-quality chart training data. We then introduce CharTool, which equips MLLMs with external tools, including image cropping for localized visual perception and code-based computation for accurate numerical reasoning. Through agentic reinforcement learning on DuoChart, CharTool learns tool-integrated reasoning grounded in chart content. Extensive experiments on six chart benchmarks show that our method consistently improves over strong MLLM baselines across model scales. Notably, CharTool-7B outperforms the base model by **+8.0%** on CharXiv (Reasoning) and **+9.78%** on ChartQAPro, while achieving competitive performance with substantially larger or proprietary models. Moreover, CharTool demonstrates positive generalization to out-of-domain visual math reasoning benchmarks.

CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

Abstract

Charts are ubiquitous in scientific and financial literature for presenting structured data. However, chart reasoning remains challenging for multimodal large language models (MLLMs) due to the lack of high-quality training data, as well as the need for fine-grained visual grounding and precise numerical computation. To address these challenges, we first propose DuoChart, a scalable dual-source data pipeline that combines synthesized charts with real-world charts to construct diverse, high-quality chart training data. We then introduce CharTool, which equips MLLMs with external tools, including image cropping for localized visual perception and code-based computation for accurate numerical reasoning. Through agentic reinforcement learning on DuoChart, CharTool learns tool-integrated reasoning grounded in chart content. Extensive experiments on six chart benchmarks show that our method consistently improves over strong MLLM baselines across model scales. Notably, CharTool-7B outperforms the base model by **+8.0%** on CharXiv (Reasoning) and **+9.78%** on ChartQAPro, while achieving competitive performance with substantially larger or proprietary models. Moreover, CharTool demonstrates positive generalization to out-of-domain visual math reasoning benchmarks.

Paper Structure

This paper contains 54 sections, 6 equations, 12 figures, 16 tables.

Figures (12)

  • Figure 1: Motivation for our method. (Left) Chart reasoning requires fine-grained visual perception and numerical reasoning. (Middle) Synthetic charts often lack diversity and visual quality. (Right) Purely textual reasoning leads to errors on complex layouts, while explicit tool grounding enables accurate, localized analysis. (Only cropping is illustrated; see \ref{['app:case-study']} for more examples.)
  • Figure 2: Data synthesis pipeline of DuoChart. (A). Chart images are constructed from two sources prior to quality filtering: a scalable LLM-based code synthesis pipeline and real-world chart mining. (B). High-quality QAs, named DuoChart, are generated by metadata-guided QA generation followed by rigorous four-stage quality validation. (C). Cold-start trajectories are synthesized by an advanced MLLM-powered Tool Agent.
  • Figure 3: Data statistics of the charts (Left) and QAs (Right) in DuoChart.
  • Figure 4: Comparison of synthesized dataset quality.
  • Figure 5: Distribution of tool calls under different benchmarks.
  • ...and 7 more figures