Narrative Trails: A Method for Coherent Storyline Extraction via Maximum Capacity Path Optimization
Fausto German, Brian Keith, Chris North
TL;DR
Narrative Trails reframes narrative extraction as a maximum capacity path problem on a coherence graph built from latent-space document representations, enabling efficient extraction of up to $k$ distinct storylines between user-specified endpoints. The method combines a projection-space representation (via embeddings, UMAP, and HDBSCAN), a coherence graph with base and sparse coherence measures, and a maximin path-optimization procedure with redundancy reduction. Empirical results across Wikispeedia and multiple datasets show that Narrative Trails achieves higher coherence and reliability than baselines and is faster than Narrative Maps, with demonstrated generalizability and potential for multimodal extensions. This approach offers a scalable, abstractive alternative for sensemaking and information synthesis in large, diverse corpora.
Abstract
Traditional information retrieval is primarily concerned with finding relevant information from large datasets without imposing a structure within the retrieved pieces of data. However, structuring information in the form of narratives--ordered sets of documents that form coherent storylines--allows us to identify, interpret, and share insights about the connections and relationships between the ideas presented in the data. Despite their significance, current approaches for algorithmically extracting storylines from data are scarce, with existing methods primarily relying on intricate word-based heuristics and auxiliary document structures. Moreover, many of these methods are difficult to scale to large datasets and general contexts, as they are designed to extract storylines for narrow tasks. In this paper, we propose Narrative Trails, an efficient, general-purpose method for extracting coherent storylines in large text corpora. Specifically, our method uses the semantic-level information embedded in the latent space of deep learning models to build a sparse coherence graph and extract narratives that maximize the minimum coherence of the storylines. By quantitatively evaluating our proposed methods on two distinct narrative extraction tasks, we show the generalizability and scalability of Narrative Trails in multiple contexts while also simplifying the extraction pipeline.
