Constraint representation towards precise data-driven storytelling
Yu-Zhe Shi, Haotian Li, Lecheng Ruan, Huamin Qu
TL;DR
The paper addresses the challenge of automating data-driven storytelling while preserving both persuasive framing and evidential grounding. It proposes a constraint-centric framework built on two hierarchies—interpretation and articulation—that guide both narrative and visualization generation, bounded by domain-specific constraints represented as Domain-Specific Languages (DSLs). Through the Yellow River example, the authors illustrate how seed ideas, narrative structure, evidence interpretation, and photorealistic visuals can be coherently coordinated within a constrained space. They discuss integrating constraints into current workflows, automating constraint synthesis with AutoDSL, and the continued importance of human involvement to balance creativity with rigor. Overall, the work outlines a path toward scalable, precise, and stylistically adaptable data story generation that preserves artistic expression while ensuring scientific grounding.
Abstract
Data-driven storytelling serves as a crucial bridge for communicating ideas in a persuasive way. However, the manual creation of data stories is a multifaceted, labor-intensive, and case-specific effort, limiting their broader application. As a result, automating the creation of data stories has emerged as a significant research thrust. Despite advances in Artificial Intelligence, the systematic generation of data stories remains challenging due to their hybrid nature: they must frame a perspective based on a seed idea in a top-down manner, similar to traditional storytelling, while coherently grounding insights of given evidence in a bottom-up fashion, akin to data analysis. These dual requirements necessitate precise constraints on the permissible space of a data story. In this viewpoint, we propose integrating constraints into the data story generation process. Defined upon the hierarchies of interpretation and articulation, constraints shape both narrations and illustrations to align with seed ideas and contextualized evidence. We identify the taxonomy and required functionalities of these constraints. Although constraints can be heterogeneous and latent, we explore the potential to represent them in a computation-friendly fashion via Domain-Specific Languages. We believe that leveraging constraints will facilitate both artistic and scientific aspects of data story generation.
