RAG without Forgetting: Continual Query-Infused Key Memory

Yuntong Hu; Sha Li; Naren Ramakrishnan; Liang Zhao

RAG without Forgetting: Continual Query-Infused Key Memory

Yuntong Hu, Sha Li, Naren Ramakrishnan, Liang Zhao

TL;DR

This work tackles the retrieval bottleneck in retrieval-augmented generation by addressing the stateless nature of query-time adaptations and the drift-prone offline updates of key-based indexing. It proposes Evolving Retrieval Memory (ERM), a training-free framework that converts validated query expansions into persistent, norm-bounded updates to document keys using correctness-gated feedback and selective attribution, ensuring convergence and zero inference-time overhead. The authors prove theoretical equivalences between query and key expansions under common similarity measures and establish convergence/stability guarantees for the evolving keys. Empirically, ERM delivers consistent retrieval and generation gains across BEIR and BRIGHT benchmarks (13 domains) with near-native retrieval latency, particularly boosting reasoning-intensive tasks, and demonstrating robustness across multiple retrievers and indexing schemes. Overall, ERM enables cumulative learning in a RAG setting without retraining, offering scalable, efficient improvements for knowledge-intensive applications.

Abstract

Retrieval-augmented generation (RAG) systems commonly improve robustness via query-time adaptations such as query expansion and iterative retrieval. While effective, these approaches are inherently stateless: adaptations are recomputed for each query and discarded thereafter, precluding cumulative learning and repeatedly incurring inference-time cost. Index-side approaches like key expansion introduce persistence but rely on offline preprocessing or heuristic updates that are weakly aligned with downstream task utility, leading to semantic drift and noise accumulation. We propose Evolving Retrieval Memory (ERM), a training-free framework that transforms transient query-time gains into persistent retrieval improvements. ERM updates the retrieval index through correctness-gated feedback, selectively attributes atomic expansion signals to the document keys they benefit, and progressively evolves keys via stable, norm-bounded updates. We show that query and key expansion are theoretically equivalent under standard similarity functions and prove convergence of ERM's selective updates, amortizing optimal query expansion into a stable index with zero inference-time overhead. Experiments on BEIR and BRIGHT across 13 domains demonstrate consistent gains in retrieval and generation, particularly on reasoning-intensive tasks, at native retrieval speed.

RAG without Forgetting: Continual Query-Infused Key Memory

TL;DR

Abstract

Paper Structure (67 sections, 5 theorems, 15 equations, 7 figures, 5 tables)

This paper contains 67 sections, 5 theorems, 15 equations, 7 figures, 5 tables.

Introduction
Related Work
Memory-Augmented Retrieval
Retrieval-Augmented Generation (RAG)
Query Expansion and Rewriting
Problem Formulation
Retrieval Systems.
Retrieval-Augmented Generation (RAG).
Query Expansion.
Evolving Retrieval Memory
Correctness-Gated Feedback Verifier
Unified correctness signal.
Selective Expansion Attribution
Progressive Key Evolution
Empirical Evaluation
...and 52 more sections

Key Result

Proposition 4.1

Assume the retriever similarity $\mathrm{sim}(\cdot,\cdot)$ is bilinear or monotone under additive embeddings. Then for any expansion unit $e_j$, Thus expanding queries is equivalent to expanding keys with respect to retrieval ranking under standard similarity operators.

Figures (7)

Figure 1: Comparison of Query Expansion (QE), Key Expansion (KE), and Evolving Retrieval Memory (ERM).Left: QE aligns queries to document space via inference-time expansions that are discarded after each query. Middle: KE persistently aligns documents to queries through offline index enrichment but incurs high cost and drift. Right: ERM converts validated query expansions into stable key updates, progressively aligns query and document distributions with no inference-time overhead.
Figure 2: Illustration of ERM. (a) Correctness-gated verification filters task-validated query expansion units. (b) Selective attribution assigns expansion benefits to retrieved documents. (c) Softmax-normalized accumulation updates document keys.
Figure 3: Performance vs. latency trade-off. Comparison of retrieval performance (nDCG@10, left bars) and inference time (log scale, right bars) for Naive Retrieval, ERM, and Query Expansion (HyDE). ERM achieves performance competitive with or exceeding HyDE while maintaining near-native retrieval latency (ms vs. seconds). Configuration: GTE-base retriever, 0.5 split rate, title indexing.
Figure 4: ERM performance as a function of adaptation budget. nDCG@10 on held-out queries after evolving keys using an increasing fraction (0.3–0.8) of disjoint adaptation queries, with keys reset for each split. Results are shown for the GTE-base retriever with HyDE query expansion and title-based indexing. Performance improves monotonically as ERM is allowed to adapt using more past queries.
Figure 5: Index method comparison using GTE-base retriever across all datasets. Title indexing dominates for StackExchange Q&A domains, while abstract/keywords work better for technical and mathematical content. The consistent advantage of title indexing for Q&A suggests that concise document representations reduce noise in dense retrieval.
...and 2 more figures

Theorems & Definitions (11)

Proposition 4.1: Query-Key Equivalence Under Semantic Composition
Proposition 4.2: Cumulative Consistency of Attribution Scores
Theorem 4.3: Stability of Key Sequences
Corollary 4.4: Expected Consistency Under Additive Similarity
Proposition 4.5: Amortized Expansion Cost
proof
Remark 1.1: Snapshot approximation
proof
proof
Remark 1.2: Scope of applicability
...and 1 more

RAG without Forgetting: Continual Query-Infused Key Memory

TL;DR

Abstract

RAG without Forgetting: Continual Query-Infused Key Memory

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (11)