SCORE: Story Coherence and Retrieval Enhancement for AI Narratives
Qiang Yi, Yangfan He, Jianhui Wang, Xinyuan Song, ShiYao Qian, Xinhang Yuan, Yi Xin, Yijin Wang, Jingqun Tang, Yuchen Li, Junjiang Lin, Hongyang He, Zhen Tian, Tianxiang Xu, Keqin Li, Kuan Lu, Menghao Huo, Jiaqi Chen, Miao Zhang, Tianyu Shi, Jianyuan Ni
TL;DR
SCORE tackles the problem of long-form narrative coherence in AI-generated stories by introducing a retrieval-augmented framework that tracks key items and summarises episodes. It combines dynamic state tracking, context-aware summaries, and RAG with sentiment and similarity-based retrieval to detect and resolve continuity errors. Empirical results show significant improvements in coherence, consistency, and item continuity across multiple LLMs and genres, with ablations highlighting the importance of dynamic tracking and contextual summaries. The approach offers a scalable, multi-LLM compatible solution for refining AI narratives, though retrieval accuracy and efficiency remain areas for further work.
Abstract
Large Language Models (LLMs) can generate creative and engaging narratives from user-specified input, but maintaining coherence and emotional depth throughout these AI-generated stories remains a challenge. In this work, we propose SCORE, a framework for Story Coherence and Retrieval Enhancement, designed to detect and resolve narrative inconsistencies. By tracking key item statuses and generating episode summaries, SCORE uses a Retrieval-Augmented Generation (RAG) approach to identify related episodes and enhance the overall story structure. Experimental results from testing multiple LLM-generated stories demonstrate that SCORE significantly improves the consistency and stability of narrative coherence compared to baseline GPT models, providing a more robust method for evaluating and refining AI-generated narratives.
