Table of Contents
Fetching ...

RAG without Forgetting: Continual Query-Infused Key Memory

Yuntong Hu, Sha Li, Naren Ramakrishnan, Liang Zhao

TL;DR

This work tackles the retrieval bottleneck in retrieval-augmented generation by addressing the stateless nature of query-time adaptations and the drift-prone offline updates of key-based indexing. It proposes Evolving Retrieval Memory (ERM), a training-free framework that converts validated query expansions into persistent, norm-bounded updates to document keys using correctness-gated feedback and selective attribution, ensuring convergence and zero inference-time overhead. The authors prove theoretical equivalences between query and key expansions under common similarity measures and establish convergence/stability guarantees for the evolving keys. Empirically, ERM delivers consistent retrieval and generation gains across BEIR and BRIGHT benchmarks (13 domains) with near-native retrieval latency, particularly boosting reasoning-intensive tasks, and demonstrating robustness across multiple retrievers and indexing schemes. Overall, ERM enables cumulative learning in a RAG setting without retraining, offering scalable, efficient improvements for knowledge-intensive applications.

Abstract

Retrieval-augmented generation (RAG) systems commonly improve robustness via query-time adaptations such as query expansion and iterative retrieval. While effective, these approaches are inherently stateless: adaptations are recomputed for each query and discarded thereafter, precluding cumulative learning and repeatedly incurring inference-time cost. Index-side approaches like key expansion introduce persistence but rely on offline preprocessing or heuristic updates that are weakly aligned with downstream task utility, leading to semantic drift and noise accumulation. We propose Evolving Retrieval Memory (ERM), a training-free framework that transforms transient query-time gains into persistent retrieval improvements. ERM updates the retrieval index through correctness-gated feedback, selectively attributes atomic expansion signals to the document keys they benefit, and progressively evolves keys via stable, norm-bounded updates. We show that query and key expansion are theoretically equivalent under standard similarity functions and prove convergence of ERM's selective updates, amortizing optimal query expansion into a stable index with zero inference-time overhead. Experiments on BEIR and BRIGHT across 13 domains demonstrate consistent gains in retrieval and generation, particularly on reasoning-intensive tasks, at native retrieval speed.

RAG without Forgetting: Continual Query-Infused Key Memory

TL;DR

This work tackles the retrieval bottleneck in retrieval-augmented generation by addressing the stateless nature of query-time adaptations and the drift-prone offline updates of key-based indexing. It proposes Evolving Retrieval Memory (ERM), a training-free framework that converts validated query expansions into persistent, norm-bounded updates to document keys using correctness-gated feedback and selective attribution, ensuring convergence and zero inference-time overhead. The authors prove theoretical equivalences between query and key expansions under common similarity measures and establish convergence/stability guarantees for the evolving keys. Empirically, ERM delivers consistent retrieval and generation gains across BEIR and BRIGHT benchmarks (13 domains) with near-native retrieval latency, particularly boosting reasoning-intensive tasks, and demonstrating robustness across multiple retrievers and indexing schemes. Overall, ERM enables cumulative learning in a RAG setting without retraining, offering scalable, efficient improvements for knowledge-intensive applications.

Abstract

Retrieval-augmented generation (RAG) systems commonly improve robustness via query-time adaptations such as query expansion and iterative retrieval. While effective, these approaches are inherently stateless: adaptations are recomputed for each query and discarded thereafter, precluding cumulative learning and repeatedly incurring inference-time cost. Index-side approaches like key expansion introduce persistence but rely on offline preprocessing or heuristic updates that are weakly aligned with downstream task utility, leading to semantic drift and noise accumulation. We propose Evolving Retrieval Memory (ERM), a training-free framework that transforms transient query-time gains into persistent retrieval improvements. ERM updates the retrieval index through correctness-gated feedback, selectively attributes atomic expansion signals to the document keys they benefit, and progressively evolves keys via stable, norm-bounded updates. We show that query and key expansion are theoretically equivalent under standard similarity functions and prove convergence of ERM's selective updates, amortizing optimal query expansion into a stable index with zero inference-time overhead. Experiments on BEIR and BRIGHT across 13 domains demonstrate consistent gains in retrieval and generation, particularly on reasoning-intensive tasks, at native retrieval speed.
Paper Structure (67 sections, 5 theorems, 15 equations, 7 figures, 5 tables)

This paper contains 67 sections, 5 theorems, 15 equations, 7 figures, 5 tables.

Key Result

Proposition 4.1

Assume the retriever similarity $\mathrm{sim}(\cdot,\cdot)$ is bilinear or monotone under additive embeddings. Then for any expansion unit $e_j$, Thus expanding queries is equivalent to expanding keys with respect to retrieval ranking under standard similarity operators.

Figures (7)

  • Figure 1: Comparison of Query Expansion (QE), Key Expansion (KE), and Evolving Retrieval Memory (ERM).Left: QE aligns queries to document space via inference-time expansions that are discarded after each query. Middle: KE persistently aligns documents to queries through offline index enrichment but incurs high cost and drift. Right: ERM converts validated query expansions into stable key updates, progressively aligns query and document distributions with no inference-time overhead.
  • Figure 2: Illustration of ERM. (a) Correctness-gated verification filters task-validated query expansion units. (b) Selective attribution assigns expansion benefits to retrieved documents. (c) Softmax-normalized accumulation updates document keys.
  • Figure 3: Performance vs. latency trade-off. Comparison of retrieval performance (nDCG@10, left bars) and inference time (log scale, right bars) for Naive Retrieval, ERM, and Query Expansion (HyDE). ERM achieves performance competitive with or exceeding HyDE while maintaining near-native retrieval latency (ms vs. seconds). Configuration: GTE-base retriever, 0.5 split rate, title indexing.
  • Figure 4: ERM performance as a function of adaptation budget. nDCG@10 on held-out queries after evolving keys using an increasing fraction (0.3–0.8) of disjoint adaptation queries, with keys reset for each split. Results are shown for the GTE-base retriever with HyDE query expansion and title-based indexing. Performance improves monotonically as ERM is allowed to adapt using more past queries.
  • Figure 5: Index method comparison using GTE-base retriever across all datasets. Title indexing dominates for StackExchange Q&A domains, while abstract/keywords work better for technical and mathematical content. The consistent advantage of title indexing for Q&A suggests that concise document representations reduce noise in dense retrieval.
  • ...and 2 more figures

Theorems & Definitions (11)

  • Proposition 4.1: Query-Key Equivalence Under Semantic Composition
  • Proposition 4.2: Cumulative Consistency of Attribution Scores
  • Theorem 4.3: Stability of Key Sequences
  • Corollary 4.4: Expected Consistency Under Additive Similarity
  • Proposition 4.5: Amortized Expansion Cost
  • proof
  • Remark 1.1: Snapshot approximation
  • proof
  • proof
  • Remark 1.2: Scope of applicability
  • ...and 1 more