Previously on the Stories: Recap Snippet Identification for Story Reading
Jiangnan Li, Qiujing Wang, Liyan Xu, Wenjie Pang, Mo Yu, Zheng Lin, Weiping Wang, Jie Zhou
TL;DR
This work introduces Recap Snippet Identification, a task aimed at identifying prior-context recap snippets that are temporally and causally linked to a target snippet in books and TV productions. It presents RECIDENT, a hand-crafted benchmark with book and TV domains, including expert annotations, cross-language alignment, and a consistent 60-snippet history window. The study evaluates prompting-based LLMs, unsupervised Line2Note training, and supervised fine-tuning, revealing a gap between human performance and current models, and showing that Line2Note enhances similarity-based models while LLMs struggle as direct rankers. Findings highlight that proximity and explicit event information aid recap identification, and propose a practical pipeline combining lightweight models with LLM prompts to balance performance and efficiency, with implications for reading apps and narrative understanding systems.
Abstract
Similar to the "previously-on" scenes in TV shows, recaps can help book reading by recalling the readers' memory about the important elements in previous texts to better understand the ongoing plot. Despite its usefulness, this application has not been well studied in the NLP community. We propose the first benchmark on this useful task called Recap Snippet Identification with a hand-crafted evaluation dataset. Our experiments show that the proposed task is challenging to PLMs, LLMs, and proposed methods as the task requires a deep understanding of the plot correlation between snippets.
