Table of Contents
Fetching ...

EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

Muye Huang, Han Lai, Xinyu Zhang, Wenjun Wu, Jie Ma, Lingling Zhang, Jun Liu

TL;DR

EvoChart tackles the gap between chart-style training data and real-world chart understanding by introducing a three-stage self-training pipeline that synthesizes high-quality, diverse chart data and an accompanying real-world benchmark, EvoChart-QA. The method alternates between compositional chart generation, chart evaluation/refinement, and QA-pair generation/training to produce progressively harder data and a stronger chart-understanding capability. Empirical results show EvoChart achieving 54.2% accuracy on EvoChart-QA (surpassing GPT-4o at 49.8%) and 81.5% on ChartQA, while also revealing that real-world chart understanding remains challenging for all models, especially on Complex Retrieval tasks and non-tabular chart types like Pie and Scatter. The work demonstrates the value of self-training with a refinement loop and provides a practical, multi-source benchmark that better reflects real-world chart understanding, with potential to guide future advances in visual-language chart reasoning.

Abstract

Chart understanding enables automated data analysis for humans, which requires models to achieve highly accurate visual comprehension. While existing Visual Language Models (VLMs) have shown progress in chart understanding, the lack of high-quality training data and comprehensive evaluation benchmarks hinders VLM chart comprehension. In this paper, we introduce EvoChart, a novel self-training method for generating synthetic chart data to enhance VLMs' capabilities in real-world chart comprehension. We also propose EvoChart-QA, a noval benchmark for measuring models' chart comprehension abilities in real-world scenarios. Specifically, EvoChart is a unique self-training data synthesis approach that simultaneously produces high-quality training corpus and a high-performance chart understanding model. EvoChart-QA consists of 650 distinct real-world charts collected from 140 different websites and 1,250 expert-curated questions that focus on chart understanding. Experimental results on various open-source and proprietary VLMs tested on EvoChart-QA demonstrate that even the best proprietary model, GPT-4o, achieves only 49.8% accuracy. Moreover, the EvoChart method significantly boosts the performance of open-source VLMs on real-world chart understanding tasks, achieving 54.2% accuracy on EvoChart-QA.

EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

TL;DR

EvoChart tackles the gap between chart-style training data and real-world chart understanding by introducing a three-stage self-training pipeline that synthesizes high-quality, diverse chart data and an accompanying real-world benchmark, EvoChart-QA. The method alternates between compositional chart generation, chart evaluation/refinement, and QA-pair generation/training to produce progressively harder data and a stronger chart-understanding capability. Empirical results show EvoChart achieving 54.2% accuracy on EvoChart-QA (surpassing GPT-4o at 49.8%) and 81.5% on ChartQA, while also revealing that real-world chart understanding remains challenging for all models, especially on Complex Retrieval tasks and non-tabular chart types like Pie and Scatter. The work demonstrates the value of self-training with a refinement loop and provides a practical, multi-source benchmark that better reflects real-world chart understanding, with potential to guide future advances in visual-language chart reasoning.

Abstract

Chart understanding enables automated data analysis for humans, which requires models to achieve highly accurate visual comprehension. While existing Visual Language Models (VLMs) have shown progress in chart understanding, the lack of high-quality training data and comprehensive evaluation benchmarks hinders VLM chart comprehension. In this paper, we introduce EvoChart, a novel self-training method for generating synthetic chart data to enhance VLMs' capabilities in real-world chart comprehension. We also propose EvoChart-QA, a noval benchmark for measuring models' chart comprehension abilities in real-world scenarios. Specifically, EvoChart is a unique self-training data synthesis approach that simultaneously produces high-quality training corpus and a high-performance chart understanding model. EvoChart-QA consists of 650 distinct real-world charts collected from 140 different websites and 1,250 expert-curated questions that focus on chart understanding. Experimental results on various open-source and proprietary VLMs tested on EvoChart-QA demonstrate that even the best proprietary model, GPT-4o, achieves only 49.8% accuracy. Moreover, the EvoChart method significantly boosts the performance of open-source VLMs on real-world chart understanding tasks, achieving 54.2% accuracy on EvoChart-QA.
Paper Structure (20 sections, 9 figures, 6 tables)

This paper contains 20 sections, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Case of Modified ChartQA. "Original" refers to the question from the ChartQA dataset, while "Modified" refers to our modified version.
  • Figure 2: The overview of the proposed EvoChart method. The figure depicts a counterclockwise cyclical self-training process, where the Chart Evaluator of each stage $k$ is trained based on the results of the previous stage $k-1$.
  • Figure 3: Four cases from the EvoChart-QA Benchmark. Q1 and Q2 are line charts, Q3 is a scatter chart, and Q4 is a pie chart.
  • Figure 4: Overview of the EvoChart-QA Benchmark.
  • Figure 5: Case 1 of EvoChart-QA, "QL" indicates that the corresponding image is located on the left side, while "QR" indicates that the corresponding image is located on the right side.
  • ...and 4 more figures