Table of Contents
Fetching ...

Compendia: Automated Visual Storytelling Generation from Online Article Collection

Manusha Karunathilaka, Litian Lei, Yiming Gao, Yong Wang, Jiannan Li

TL;DR

Compendia tackles the challenge of generating coherent data stories from unstructured online article collections. It introduces a two-module framework—Data Fact Extraction and Organization, and Visual Storytelling—leveraging LLMs for retrieval, extraction, clustering, and narrative construction, presented in an interactive scrollytelling interface. The system is evaluated through quantitative accuracy metrics (e.g., 97.2% fact-content and data-point accuracy) and two user studies, demonstrating high usability and the ability to produce engaging, source-traceable narratives. The work advances automated storytelling from unstructured text and suggests future directions in fact-checking, temporal reasoning, and personalized storytelling to enhance trust and applicability in real-world search interfaces.

Abstract

In the digital age, readers value quantitative journalism that is clear, concise, analytical, and human-centred. To understand complex topics, they often piece together scattered facts from multiple articles. Visual storytelling can transform fragmented information into clear, engaging narratives, yet its use with unstructured online articles remains largely unexplored. To fill this gap, we present Compendia, an automated system that analyzes online articles in response to a user's query and generates a coherent data story tailored to the user's informational needs. Compendia addresses key challenges of storytelling from unstructured text through two modules covering: Online Article Retrieval, which gathers relevant articles; Data Fact Extraction, which identifies, validates, and refines quantitative facts; Fact Organization, which clusters and merges related facts into coherent thematic groups; and Visual Storytelling, which transforms the organized facts into narratives with visualizations in an interactive scrollytelling interface. We evaluated Compendia through a quantitative analysis, confirming the accuracy in fact extraction and organization, and through two user studies with 16 participants, demonstrating its usability, effectiveness, and ability to produce engaging visual stories for open-ended queries.

Compendia: Automated Visual Storytelling Generation from Online Article Collection

TL;DR

Compendia tackles the challenge of generating coherent data stories from unstructured online article collections. It introduces a two-module framework—Data Fact Extraction and Organization, and Visual Storytelling—leveraging LLMs for retrieval, extraction, clustering, and narrative construction, presented in an interactive scrollytelling interface. The system is evaluated through quantitative accuracy metrics (e.g., 97.2% fact-content and data-point accuracy) and two user studies, demonstrating high usability and the ability to produce engaging, source-traceable narratives. The work advances automated storytelling from unstructured text and suggests future directions in fact-checking, temporal reasoning, and personalized storytelling to enhance trust and applicability in real-world search interfaces.

Abstract

In the digital age, readers value quantitative journalism that is clear, concise, analytical, and human-centred. To understand complex topics, they often piece together scattered facts from multiple articles. Visual storytelling can transform fragmented information into clear, engaging narratives, yet its use with unstructured online articles remains largely unexplored. To fill this gap, we present Compendia, an automated system that analyzes online articles in response to a user's query and generates a coherent data story tailored to the user's informational needs. Compendia addresses key challenges of storytelling from unstructured text through two modules covering: Online Article Retrieval, which gathers relevant articles; Data Fact Extraction, which identifies, validates, and refines quantitative facts; Fact Organization, which clusters and merges related facts into coherent thematic groups; and Visual Storytelling, which transforms the organized facts into narratives with visualizations in an interactive scrollytelling interface. We evaluated Compendia through a quantitative analysis, confirming the accuracy in fact extraction and organization, and through two user studies with 16 participants, demonstrating its usability, effectiveness, and ability to produce engaging visual stories for open-ended queries.
Paper Structure (36 sections, 7 figures)

This paper contains 36 sections, 7 figures.

Figures (7)

  • Figure 1: Compendia transforms the query "Is homeschooling preferred by people?" into a structured data story by extracting, clustering, and visualizing key facts using unstructured data from a collection of online articles. Thematic Overview uses Thematic Circles to visualize clustered facts across different themes. (A) Filter widget provides control over the overview, (B) Detailed fact panel provides fact content and source details, (C) Articles panel presents all retrieved articles relevant to the story, (D) Related Articles panel displays articles relevant to the topic, (E) Related facts panel provides the number of facts belonging to the topic, (F) Shared articles panel lists sources covering multiple aspects of the topic, and (G) Summary panel displays article and fact statistics.
  • Figure 2: The framework of our system, Compendia. It consists of two main modules: A) the Data Fact Extraction and Organization module, which includes Online Article Retrieval, Data Fact Extraction, and Fact Organization; and B) the Visual Storytelling module, which prepares the data for display and presents it as an interactive scrollytelling story. Each stage involves steps performed in the given order, with iterative validation and refinement phases. The boxes highlight the outputs at each stage.
  • Figure 3: Thematic Circle represents one cluster of facts, with the cluster representative fact (A8) displayed in the inner circle, and individual data facts (A7) positioned in the bottom half of the outer area, where each circle is color-coded with publication year of its associated article.
  • Figure 4: Scrolly flow of Compendia. The system transitions from Thematic Overview (A) to Story View (B) when scrolling down. (B) illustrates the narrative related to U.S. homeschool growth, where the corresponding fact is highlighted in the outer area of the zoomed thematic circle. Since this narrative is derived from one of the top 3 facts in the cluster, the related fact is also highlighted within the zoomed thematic circle.
  • Figure 5: Compendia generated story for a query "TikTok trends worldwide" showcasing: (A) the Thematic Overview; (B--C) the continuous scroll-down flow from (A) to (C) to explore the story; and (D) jumping to the story in the Youth Appeal theme by clicking the thematic circle (A6) in (A).
  • ...and 2 more figures