Table of Contents
Fetching ...

DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts

Mohammed Saidul Islam, Md Tahmid Rahman Laskar, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty

TL;DR

A multi-agent framework employing two LLM agents designed to replicate the human storytelling process is proposed, one for understanding and describing the data, generating the outline, and narration, and another for verification at each intermediary step.

Abstract

Data-driven storytelling is a powerful method for conveying insights by combining narrative techniques with visualizations and text. These stories integrate visual aids, such as highlighted bars and lines in charts, along with textual annotations explaining insights. However, creating such stories requires a deep understanding of the data and meticulous narrative planning, often necessitating human intervention, which can be time-consuming and mentally taxing. While Large Language Models (LLMs) excel in various NLP tasks, their ability to generate coherent and comprehensive data stories remains underexplored. In this work, we introduce a novel task for data story generation and a benchmark containing 1,449 stories from diverse sources. To address the challenges of crafting coherent data stories, we propose a multiagent framework employing two LLM agents designed to replicate the human storytelling process: one for understanding and describing the data (Reflection), generating the outline, and narration, and another for verification at each intermediary step. While our agentic framework generally outperforms non-agentic counterparts in both model-based and human evaluations, the results also reveal unique challenges in data story generation.

DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts

TL;DR

A multi-agent framework employing two LLM agents designed to replicate the human storytelling process is proposed, one for understanding and describing the data, generating the outline, and narration, and another for verification at each intermediary step.

Abstract

Data-driven storytelling is a powerful method for conveying insights by combining narrative techniques with visualizations and text. These stories integrate visual aids, such as highlighted bars and lines in charts, along with textual annotations explaining insights. However, creating such stories requires a deep understanding of the data and meticulous narrative planning, often necessitating human intervention, which can be time-consuming and mentally taxing. While Large Language Models (LLMs) excel in various NLP tasks, their ability to generate coherent and comprehensive data stories remains underexplored. In this work, we introduce a novel task for data story generation and a benchmark containing 1,449 stories from diverse sources. To address the challenges of crafting coherent data stories, we propose a multiagent framework employing two LLM agents designed to replicate the human storytelling process: one for understanding and describing the data (Reflection), generating the outline, and narration, and another for verification at each intermediary step. While our agentic framework generally outperforms non-agentic counterparts in both model-based and human evaluations, the results also reveal unique challenges in data story generation.
Paper Structure (30 sections, 29 figures, 8 tables, 1 algorithm)

This paper contains 30 sections, 29 figures, 8 tables, 1 algorithm.

Figures (29)

  • Figure 1: An example data story in our corpus extracted from GapMinder gapminder
  • Figure 2: An overview of the proposed LLM-Agent framework for data story generation.
  • Figure 3: An example of a GPT-4o-generated story using the agentic framework: The text in Blue color denotes hallucinated fact, while the red circled value is factually incorrect according to 'Table_0' of \ref{['fig:fact_hall_gpt4o_table']}.
  • Figure 4: The figure demonstrates the distribution of Story Topics in the Train set.
  • Figure 5: The figure presents an overview of the Chart data extraction process using the Gemini-1.0-pro-vision geminiteam2023gemini model.
  • ...and 24 more figures