Beyond Description: A Multimodal Agent Framework for Insightful Chart Summarization
Yuhang Bai, Yujuan Ding, Shanru Lin, Wenqi Fan
TL;DR
The paper addresses the gap in chart summarization where visual data must yield high-level insights beyond surface-level descriptions. It introduces a training-free plan-and-execute framework, Chart Insight Agent Flow (CIAF), consisting of Planner, Insight Extractor, and Summarizer to surface data- and domain-driven insights from chart images, and pairs it with ChartSummInsights, a real-world chart dataset annotated by experts. The authors propose an evaluation framework focused on insight quality and diversity, demonstrating that CIAF yields deeper, more diverse, and more accurate summaries across various backbone backends and surpasses existing baselines. This work advances data accessibility and domain-aware storytelling by enabling AI systems to produce coherent, insight-rich chart narratives with reduced factual errors. The ChartSummInsights dataset and CIAF pipeline offer a practical, benchmarked approach for future research in multimodal visual reasoning and chart analysis.
Abstract
Chart summarization is crucial for enhancing data accessibility and the efficient consumption of information. However, existing methods, including those with Multimodal Large Language Models (MLLMs), primarily focus on low-level data descriptions and often fail to capture the deeper insights which are the fundamental purpose of data visualization. To address this challenge, we propose Chart Insight Agent Flow, a plan-and-execute multi-agent framework effectively leveraging the perceptual and reasoning capabilities of MLLMs to uncover profound insights directly from chart images. Furthermore, to overcome the lack of suitable benchmarks, we introduce ChartSummInsights, a new dataset featuring a diverse collection of real-world charts paired with high-quality, insightful summaries authored by human data analysis experts. Experimental results demonstrate that our method significantly improves the performance of MLLMs on the chart summarization task, producing summaries with deep and diverse insights.
