Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions

Enamul Hoque; Mohammed Saidul Islam

Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions

Enamul Hoque, Mohammed Saidul Islam

TL;DR

This survey analyzes the state of the art in natural language generation for visualizations and proposes a five-dimension Wh-question taxonomy to organize the problem space. It synthesizes methods across chart captioning, chart summarization, chart QA, and data storytelling, detailing rule-based and deep learning approaches, including transformer and large language model–based systems. The authors identify key data and methodological gaps—such as data extraction from charts, evaluation benchmarks, and biases—and discuss avenues for future work, including generalizable models, richer multimodal inputs/outputs, human-in-the-loop workflows, and ethical considerations. Overall, the work provides a structured roadmap for advancing NLG in visualization to improve accessibility, comprehension, and trust in data-driven narratives.

Abstract

Natural language and visualization are two complementary modalities of human communication that play a crucial role in conveying information effectively. While visualizations help people discover trends, patterns, and anomalies in data, natural language descriptions help explain these insights. Thus, combining text with visualizations is a prevalent technique for effectively delivering the core message of the data. Given the rise of natural language generation (NLG), there is a growing interest in automatically creating natural language descriptions for visualizations, which can be used as chart captions, answering questions about charts, or telling data-driven stories. In this survey, we systematically review the state of the art on NLG for visualizations and introduce a taxonomy of the problem. The NLG tasks fall within the domain of Natural Language Interfaces (NLI) for visualization, an area that has garnered significant attention from both the research community and industry. To narrow down the scope of the survey, we primarily concentrate on the research works that focus on text generation for visualizations. To characterize the NLG problem and the design space of proposed solutions, we pose five Wh-questions, why and how NLG tasks are performed for visualizations, what the task inputs and outputs are, as well as where and when the generated texts are integrated with visualizations. We categorize the solutions used in the surveyed papers based on these "five Wh-questions." Finally, we discuss the key challenges and potential avenues for future research in this domain.

Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions

TL;DR

Abstract

Paper Structure (39 sections, 8 figures, 4 tables)

This paper contains 39 sections, 8 figures, 4 tables.

Introduction
Methodology and Outline of the Survey
Survey Methodology
Survey Outline
Why?
Downstream Tasks
Applications
Discussion
What?
Input
Visualization
Data table
Text
Multimodal
Output
...and 24 more sections

Figures (8)

Figure 1: An overview of the problem space of NLG with visualization, covering each of the Wh-question dimensions. Here, from left, part (a) represents the Why dimension, (b) represents the What dimension, (c) represents the How dimension, (d) represents the Where dimension, and (e) represents the When dimension. In the 'What' dimension, the numbers (e.g., '1', '2', '3', etc.) in both the input and output refer to individual input and output types. However, these individual input/output types can also be combined to create a combined input/output type.
Figure 2: An overview of the types of NLG tasks. Here, sub-figure (a) depicts an example output of Chart Summarization kantharaj-etal-2022-chart, sub-figure (b) depicts an example Chart Question-Answering from kantharaj-etal-2022-opencqa, and sub-figure (c) denotes an example generation of visual story from Calliope shi2021calliope.
Figure 3: The figure denotes an example of different types of inputs, i.e., rasterized chart image, data table, scene graph (refers to a structured format similar to a web page's Document Object Model comprising the characteristics of a chart), and chart captions generated by the VisText system tang2023vistext. The system produces Level 1, Level 2/Level 3 captions based on lundgard2021accessible. The semantic levels are discussed in Section \ref{['subsubsec: output-text']}.
Figure 4: An example of Visual Storytelling system from sun2023erato. The example presents a compelling storyline about global natural disasters. It features data points labeled in black (a, c, e, f), which were carefully chosen by a professional data analyst to serve as pivotal moments in the narrative. Additionally, points labeled in red (b, d) were generated through an interpolation algorithm.
Figure 5: The figure presents a typology of errors proposed by huang2023lvlms in the generated chart caption, namely, "Value Error" (incorrect data value), "Label Error" (incorrect label or category), "Trend Error" (incorrect presentation of a trend), "Magnitude Error" (incorrect presentation of the extent or degree of a trend's change), "Out-of-context Error" (value mentioned in the caption that does not exist in the chart), "Nonsense Error" (illogical inclusion of words), and "Grammatical Error" (grammatical mistakes).
...and 3 more figures

Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions

TL;DR

Abstract

Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions

Authors

TL;DR

Abstract

Table of Contents

Figures (8)