ChartInsighter: An Approach for Mitigating Hallucination in Time-series Chart Summary Generation with A Benchmark Dataset

Fen Wang; Bomiao Wang; Xueli Shu; Zhen Liu; Zekai Shao; Chao Liu; Siming Chen

ChartInsighter: An Approach for Mitigating Hallucination in Time-series Chart Summary Generation with A Benchmark Dataset

Fen Wang, Bomiao Wang, Xueli Shu, Zhen Liu, Zekai Shao, Chao Liu, Siming Chen

TL;DR

This work tackles hallucinations in automated time-series chart summaries by introducing ChartInsighter, a multi-agent, tool-augmented framework that uses external data analysis modules and a self-consistency check to produce accurate, semantically rich summaries. It identifies key L1-L3 content elements and a taxonomy of hallucination types, and it demonstrates how iterative brainstorming, refining, and self-consistency can mitigate errors such as Extremum and Trend Direction mistakes. A high-quality benchmark (75 charts, 2693 sentences) with sentence-level hallucination annotations enables rigorous evaluation and comparison against GPT-4 and VL2NL. Empirical results show ChartInsighter achieves higher semantic richness and a lower hallucination rate, offering a practical, scalable solution for reliable time-series chart interpretation in decision-making workflows.

Abstract

Effective chart summary can significantly reduce the time and effort decision makers spend interpreting charts, enabling precise and efficient communication of data insights. Previous studies have faced challenges in generating accurate and semantically rich summaries of time-series data charts. In this paper, we identify summary elements and common hallucination types in the generation of time-series chart summaries, which serve as our guidelines for automatic generation. We introduce ChartInsighter, which automatically generates chart summaries of time-series data, effectively reducing hallucinations in chart summary generation. Specifically, we assign multiple agents to generate the initial chart summary and collaborate iteratively, during which they invoke external data analysis modules to extract insights and compile them into a coherent summary. Additionally, we implement a self-consistency test method to validate and correct our summary. We create a high-quality benchmark of charts and summaries, with hallucination types annotated on a sentence-by-sentence basis, facilitating the evaluation of the effectiveness of reducing hallucinations. Our evaluations using our benchmark show that our method surpasses state-of-the-art models, and that our summary hallucination rate is the lowest, which effectively reduces various hallucinations and improves summary quality. The benchmark is available at https://github.com/wangfen01/ChartInsighter.

ChartInsighter: An Approach for Mitigating Hallucination in Time-series Chart Summary Generation with A Benchmark Dataset

TL;DR

Abstract

Paper Structure (22 sections, 7 figures, 1 table)

This paper contains 22 sections, 7 figures, 1 table.

Introduction
Related Work
Large Language Models for Visualization
Enhancing reasoning and factual knowledge in LLMs
Large Language Models for Chart Summarization
Preliminaries
Requirements
Summary Elements
Hallucination Types
ChartInsighter
Brainstorming
Refining
Self-consistency Test
Linking Summary to Chart
Benchmark
...and 7 more sections

Figures (7)

Figure 1: Examples of time-series chart summaries generated with GPT-4, VL2NLko2024natural, and ChartInsighter. Errors are indicated in red text, while correct points are highlighted in green text. GPT-4 makes an "Extremum Error", misidentifying 2008 as the peak year instead of the correct year, 2007, and a "Trend Direction Error", incorrectly describing a downward trend as an upward trend. VL2NL makes a "Numerical Value Error", incorrectly calculating Apple's average stock price. In contrast, ChartInsighter provides a correct summary.
Figure 2: Examples of time-series chart summary elements. We classify them into L1-L3, employ simple line diagrams to visually illustrate the meaning of these elements, and present example sentences containing specific elements.
Figure 3: The frequency of different types of hallucinations in LLM-generated time-series chart summaries.
Figure 4: The pipeline of ChartInsighter includes three steps: Brainstorming, Refining, and Self-consistency Test. In ChartInsighter, we input visualization specification and data table to initiate the analysis process. This is first handled by both Uni-Insighter and Multi-Insighter which generate preliminary uni- and multi-dimensional data insights respectively, and compile an initial summary. In the refining stage, we have designed a multi-agent collaborative process between the Multi-Insighter and the Writer. This iterative process, which involves both mining and organizing insights, enables us to achieve a relatively accurate and comprehensive summary. At last, in the self-consistency test phase, we concentrate on identifying and addressing key types of hallucinations to produce the final version of chart summary. In Prompt Template, we display the input, prompt, and output of each step. For example, the input, prompt, and output of Step c are demonstrated in Prompt Template c'. It should be specifically pointed out that Step g builds upon the input and prompt from Step d, with additional new content highlighted in orange font in Prompt Template g'.
Figure 5: The overview of ChartInsighter. Users can input a Vega-Lite specification and data table to generate a summary. By hovering over sentences containing data references, the corresponding portions in the chart are highlighted (a). Additionally, users can interact with the chat view, prompting the model to modify the summary or elaborate on details they find more interesting, resulting in a more satisfactory summary.
...and 2 more figures

ChartInsighter: An Approach for Mitigating Hallucination in Time-series Chart Summary Generation with A Benchmark Dataset

TL;DR

Abstract

ChartInsighter: An Approach for Mitigating Hallucination in Time-series Chart Summary Generation with A Benchmark Dataset

Authors

TL;DR

Abstract

Table of Contents

Figures (7)