ChartInsighter: An Approach for Mitigating Hallucination in Time-series Chart Summary Generation with A Benchmark Dataset
Fen Wang, Bomiao Wang, Xueli Shu, Zhen Liu, Zekai Shao, Chao Liu, Siming Chen
TL;DR
This work tackles hallucinations in automated time-series chart summaries by introducing ChartInsighter, a multi-agent, tool-augmented framework that uses external data analysis modules and a self-consistency check to produce accurate, semantically rich summaries. It identifies key L1-L3 content elements and a taxonomy of hallucination types, and it demonstrates how iterative brainstorming, refining, and self-consistency can mitigate errors such as Extremum and Trend Direction mistakes. A high-quality benchmark (75 charts, 2693 sentences) with sentence-level hallucination annotations enables rigorous evaluation and comparison against GPT-4 and VL2NL. Empirical results show ChartInsighter achieves higher semantic richness and a lower hallucination rate, offering a practical, scalable solution for reliable time-series chart interpretation in decision-making workflows.
Abstract
Effective chart summary can significantly reduce the time and effort decision makers spend interpreting charts, enabling precise and efficient communication of data insights. Previous studies have faced challenges in generating accurate and semantically rich summaries of time-series data charts. In this paper, we identify summary elements and common hallucination types in the generation of time-series chart summaries, which serve as our guidelines for automatic generation. We introduce ChartInsighter, which automatically generates chart summaries of time-series data, effectively reducing hallucinations in chart summary generation. Specifically, we assign multiple agents to generate the initial chart summary and collaborate iteratively, during which they invoke external data analysis modules to extract insights and compile them into a coherent summary. Additionally, we implement a self-consistency test method to validate and correct our summary. We create a high-quality benchmark of charts and summaries, with hallucination types annotated on a sentence-by-sentence basis, facilitating the evaluation of the effectiveness of reducing hallucinations. Our evaluations using our benchmark show that our method surpasses state-of-the-art models, and that our summary hallucination rate is the lowest, which effectively reduces various hallucinations and improves summary quality. The benchmark is available at https://github.com/wangfen01/ChartInsighter.
