Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs
Haozhen Zhang, Tao Feng, Jiaxuan You
TL;DR
This work tackles long-context global summarization by augmenting retrieval-augmented generation with a graph-structured representation of LLM-generated historical responses. The Graph of Records (GoR) links retrieved text chunks to corresponding responses and learns rich node embeddings via a graph neural network, guided by a self-supervised BERTScore-based objective that leverages simulated queries. Key contributions include the graph construction methodology, a dual-loss training regime (contrastive and ranking), and an efficient retrieval mechanism that improves Rouge scores across four long-context datasets. The results demonstrate significant performance gains over baselines and showcase GoR's potential for more effective and scalable long-context summarization in practical settings.
Abstract
Retrieval-augmented generation (RAG) has revitalized Large Language Models (LLMs) by injecting non-parametric factual knowledge. Compared with long-context LLMs, RAG is considered an effective summarization tool in a more concise and lightweight manner, which can interact with LLMs multiple times using diverse queries to get comprehensive responses. However, the LLM-generated historical responses, which contain potentially insightful information, are largely neglected and discarded by existing approaches, leading to suboptimal results. In this paper, we propose $\textit{graph of records}$ ($\textbf{GoR}$), which leverages historical responses generated by LLMs to enhance RAG for long-context global summarization. Inspired by the $\textit{retrieve-then-generate}$ paradigm of RAG, we construct a graph by establishing an edge between the retrieved text chunks and the corresponding LLM-generated response. To further uncover the intricate correlations between them, GoR features a $\textit{graph neural network}$ and an elaborately designed $\textit{BERTScore}$-based objective for self-supervised model training, enabling seamless supervision signal backpropagation between reference summaries and node embeddings. We comprehensively compare GoR with 12 baselines across four long-context summarization datasets, and the results indicate that our proposed method reaches the best performance ($\textit{e.g.}$, 15%, 8%, and 19% improvement over retrievers w.r.t. Rouge-L, Rouge-1, and Rouge-2 on the WCEP dataset). Extensive experiments further demonstrate the effectiveness of GoR.
