Enhancing Incremental Summarization with Structured Representations
EunJeong Hwang, Yichao Zhou, James Bradley Wendt, Beliz Gunel, Nguyen Vo, Jing Xie, Sandeep Tata
TL;DR
The paper tackles the challenge of generating coherent summaries from long, multi-source inputs by introducing structured, JSON-based memory representations and a Chain-of-Key (CoK) update mechanism that incrementally updates memory as new sources arrive. This approach, combined with an explicit schema-driven initial summary and two update operations (Update and Add), significantly improves summarization performance on SUMIE and BooookScore, achieving up to 40% and 14% gains, with additional gains from CoK itself. The method demonstrates strong memory retention under token-limited contexts and consistently outperforms unitary generation baselines across two LLMs, though it can over-detail narratives in some cases and incurs higher evaluation costs. Overall, structured memory for incremental summarization offers a practical path to more accurate and coherent summaries of long documents in real-world settings, with future work aiming to refine detail filtering and extend applicability.
Abstract
Large language models (LLMs) often struggle with processing extensive input contexts, which can lead to redundant, inaccurate, or incoherent summaries. Recent methods have used unstructured memory to incrementally process these contexts, but they still suffer from information overload due to the volume of unstructured data handled. In our study, we introduce structured knowledge representations ($GU_{json}$), which significantly improve summarization performance by 40% and 14% across two public datasets. Most notably, we propose the Chain-of-Key strategy ($CoK_{json}$) that dynamically updates or augments these representations with new information, rather than recreating the structured memory for each new source. This method further enhances performance by 7% and 4% on the datasets.
