Table of Contents
Fetching ...

InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts

Tianchi Xie, Minzhi Lin, Mengchen Liu, Yilin Ye, Changjian Chen, Shixia Liu

TL;DR

InfoChartQA introduces a large-scale benchmark for multimodal reasoning on infographic charts by pairing each infographic with a data-equivalent plain chart. It provides a two-pronged QA suite: text-based questions grounded in data facts and visual-element-based questions targeting pictograms, icons, and metaphors, including multi-panel co-referential reasoning. The dataset comprises 5,948 infographic charts with 50,920 text-based questions and over 7K visual-element-based questions, along with dummy-focused metaphor questions, enabling fine-grained diagnostics via paired charts. Evaluations across 20 MLLMs reveal a substantial performance gap between infographic and plain charts, with metaphor-related questions being particularly challenging; analyses attribute degradation to visual complexity and weak text–visual alignments, while simple prompt tweaks can yield measurable gains. The work offers a publicly available benchmark and insights to guide future improvements in infographic-chart understanding for multimodal models.

Abstract

Understanding infographic charts with design-driven visual elements (e.g., pictograms, icons) requires both visual recognition and reasoning, posing challenges for multimodal large language models (MLLMs). However, existing visual-question answering benchmarks fall short in evaluating these capabilities of MLLMs due to the lack of paired plain charts and visual-element-based questions. To bridge this gap, we introduce InfoChartQA, a benchmark for evaluating MLLMs on infographic chart understanding. It includes 5,642 pairs of infographic and plain charts, each sharing the same underlying data but differing in visual presentations. We further design visual-element-based questions to capture their unique visual designs and communicative intent. Evaluation of 20 MLLMs reveals a substantial performance decline on infographic charts, particularly for visual-element-based questions related to metaphors. The paired infographic and plain charts enable fine-grained error analysis and ablation studies, which highlight new opportunities for advancing MLLMs in infographic chart understanding. We release InfoChartQA at https://github.com/CoolDawnAnt/InfoChartQA.

InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts

TL;DR

InfoChartQA introduces a large-scale benchmark for multimodal reasoning on infographic charts by pairing each infographic with a data-equivalent plain chart. It provides a two-pronged QA suite: text-based questions grounded in data facts and visual-element-based questions targeting pictograms, icons, and metaphors, including multi-panel co-referential reasoning. The dataset comprises 5,948 infographic charts with 50,920 text-based questions and over 7K visual-element-based questions, along with dummy-focused metaphor questions, enabling fine-grained diagnostics via paired charts. Evaluations across 20 MLLMs reveal a substantial performance gap between infographic and plain charts, with metaphor-related questions being particularly challenging; analyses attribute degradation to visual complexity and weak text–visual alignments, while simple prompt tweaks can yield measurable gains. The work offers a publicly available benchmark and insights to guide future improvements in infographic-chart understanding for multimodal models.

Abstract

Understanding infographic charts with design-driven visual elements (e.g., pictograms, icons) requires both visual recognition and reasoning, posing challenges for multimodal large language models (MLLMs). However, existing visual-question answering benchmarks fall short in evaluating these capabilities of MLLMs due to the lack of paired plain charts and visual-element-based questions. To bridge this gap, we introduce InfoChartQA, a benchmark for evaluating MLLMs on infographic chart understanding. It includes 5,642 pairs of infographic and plain charts, each sharing the same underlying data but differing in visual presentations. We further design visual-element-based questions to capture their unique visual designs and communicative intent. Evaluation of 20 MLLMs reveals a substantial performance decline on infographic charts, particularly for visual-element-based questions related to metaphors. The paired infographic and plain charts enable fine-grained error analysis and ablation studies, which highlight new opportunities for advancing MLLMs in infographic chart understanding. We release InfoChartQA at https://github.com/CoolDawnAnt/InfoChartQA.

Paper Structure

This paper contains 36 sections, 9 figures, 14 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of InfoChartQA27, 177, 171140, 118, 250.
  • Figure 2: The InfoChartQA27, 177, 171140, 118, 250 benchmark construction pipeline.
  • Figure 3: Example of progressively removing visual elements from infographic charts.
  • Figure 4: Model's performance change on the same infographic chart but with different number of visual elements.
  • Figure 5: Different modifications on charts.
  • ...and 4 more figures