Table of Contents
Fetching ...

A Directed Graph Model and Experimental Framework for Design and Study of Time-Dependent Text Visualisation

Songhai Fan, Simon Angus, Tim Dwyer, Ying Yang, Sarah Goodwin, Helen Purchase

TL;DR

Analysis of individual decision-making in this study hints at a future where text discourse visualisation may need to dispense with a one-size-fits-all approach and, instead, should be more adaptable to the specific user who is exploring the visualisation in front of them.

Abstract

Exponential growth in the quantity of digital news, social media, and other textual sources makes it difficult for humans to keep up with rapidly evolving narratives about world events. Various visualisation techniques have been touted to help people to understand such discourse by exposing relationships between texts (such as news articles) as topics and themes evolve over time. Arguably, the understandability of such visualisations hinges on the assumption that people will be able to easily interpret the relationships in such visual network structures. To test this assumption, we begin by defining an abstract model of time-dependent text visualisation based on directed graph structures. From this model we distill motifs that capture the set of possible ways that texts can be linked across changes in time. We also develop a controlled synthetic text generation methodology that leverages the power of modern LLMs to create fictional, yet structured sets of time-dependent texts that fit each of our patterns. Therefore, we create a clean user study environment (n=30) for participants to identify patterns that best represent a given set of synthetic articles. We find that it is a challenging task for the user to identify and recover the predefined motif. We analyse qualitative data to map an unexpectedly rich variety of user rationales when divergences from expected interpretation occur. A deeper analysis also points to unexpected complexities inherent in the formation of synthetic datasets with LLMs that undermine the study control in some cases. Furthermore, analysis of individual decision-making in our study hints at a future where text discourse visualisation may need to dispense with a one-size-fits-all approach and, instead, should be more adaptable to the specific user who is exploring the visualisation in front of them.

A Directed Graph Model and Experimental Framework for Design and Study of Time-Dependent Text Visualisation

TL;DR

Analysis of individual decision-making in this study hints at a future where text discourse visualisation may need to dispense with a one-size-fits-all approach and, instead, should be more adaptable to the specific user who is exploring the visualisation in front of them.

Abstract

Exponential growth in the quantity of digital news, social media, and other textual sources makes it difficult for humans to keep up with rapidly evolving narratives about world events. Various visualisation techniques have been touted to help people to understand such discourse by exposing relationships between texts (such as news articles) as topics and themes evolve over time. Arguably, the understandability of such visualisations hinges on the assumption that people will be able to easily interpret the relationships in such visual network structures. To test this assumption, we begin by defining an abstract model of time-dependent text visualisation based on directed graph structures. From this model we distill motifs that capture the set of possible ways that texts can be linked across changes in time. We also develop a controlled synthetic text generation methodology that leverages the power of modern LLMs to create fictional, yet structured sets of time-dependent texts that fit each of our patterns. Therefore, we create a clean user study environment (n=30) for participants to identify patterns that best represent a given set of synthetic articles. We find that it is a challenging task for the user to identify and recover the predefined motif. We analyse qualitative data to map an unexpectedly rich variety of user rationales when divergences from expected interpretation occur. A deeper analysis also points to unexpected complexities inherent in the formation of synthetic datasets with LLMs that undermine the study control in some cases. Furthermore, analysis of individual decision-making in our study hints at a future where text discourse visualisation may need to dispense with a one-size-fits-all approach and, instead, should be more adaptable to the specific user who is exploring the visualisation in front of them.
Paper Structure (52 sections, 1 equation, 10 figures, 2 tables, 2 algorithms)

This paper contains 52 sections, 1 equation, 10 figures, 2 tables, 2 algorithms.

Figures (10)

  • Figure 1: Representative narrative structures identified in our literature survey. They have in common that one spatial dimension indicates time (here, left-to-right). Different structures afford different ways to show relationships between announcements. While all use linear elements to connect pairs of announcements, the non-time dimension (here vertical) can show secondary relationships (such as shared events in storyline views) or strict thematic separation (such as thread views). Note that River is a subtype of Thread that additionally encodes time-dependent quantity (typically term frequency).
  • Figure 2: Left: An illustration of a Time-Track Narrative Graph Model. Right: The three node motifs for the Time-Track Narrative Graph Model include Sequential motifs (Linear, Arch, Ladder, Early Turn, Late Turn) and Non-Sequential motifs (Sharp Branch, Wide Branch, Sharp Merge, Wide Merge).
  • Figure 3: The Graph-to-Text Pipeline. This diagram illustrates the process of converting a TTNG into synthetic news text through three main stages: (1) Enriching the graph with narrative context using the Crafter; (2) Mapping narrative context elements with the Cartographer; (3) Generating the final synthetic news text with the Writer. The process enforces structural patterns through shared attributes while allowing natural thematic connections to emerge. For additional examples of narrative context, prompt templates, and the map algorithm, refer to the supplementary materials \ref{['appendix:llm-graph-text-generation']}.
  • Figure 4: Distribution of similarity scores for Announcement pairs in our synthetic dataset. Box extents give 25th to 75th percentiles, vertical line gives median, whilst extent of horixontal lines indicate min--max. Red boxes show similarity scores for Announcements from different tracks (expected to be lower), while blue boxes show scores for Announcements in the same track (expected to be higher). Across all three metrics (Jaccard, TF-IDF, and BERT), there is clear separation between same-track and different-track pairs, with same-track pairs consistently showing higher similarity scores compared to different-track pairs.
  • Figure 5: Confusion matrix comparing participant-selected motifs (rows) against predefined motifs (columns). Diagonal cells indicate correct identifications. The Grand Total column shows total selections per motif. Each column sums to 30, controlled by the graph-to-text pipeline input. Numbered talk bubbles correspond to detailed analysis in \ref{['sec:case']}.
  • ...and 5 more figures