Table of Contents
Fetching ...

Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement

Yuqiao Tan, Shizhu He, Huanxuan Liao, Jun Zhao, Kang Liu

TL;DR

DyPRAG introduces a dynamic parametric retrieval-augmented generation framework that converts documents into test-time parametric knowledge via a lightweight hypernetwork translator, reducing online inference and offline training/storage costs. By learning a general mapping from document embeddings to adapter parameters, DyPRAG achieves competitive or superior performance to offline PRAG while offering strong generalization and a plug-and-play integration with contextual knowledge. The approach also enables DyPRAG-Combine, which fuses parametric and contextual knowledge to further reduce RAG hallucination and improve robustness, including in out-of-distribution scenarios. Extensive experiments across multiple benchmarks (2WQA, HQA, CWQ, CWQ, SQA, IIRC) and model scales (Qwen/LLaMA) demonstrate improved knowledge fusion, reduced hallucination, and favorable cost metrics. The work provides a practical RAG paradigm with broad applicability to real-world QA tasks and highlights directions for further interpretability and deployment considerations.

Abstract

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by retrieving relevant documents from external sources and incorporating them into the context. While it improves reliability by providing factual texts, it significantly increases inference costs as context length grows and introduces challenging issue of RAG hallucination, primarily caused by the lack of corresponding parametric knowledge in LLMs. An efficient solution is to enhance the knowledge of LLMs at test-time. Parametric RAG (PRAG) addresses this by embedding document into LLMs parameters to perform test-time knowledge enhancement, effectively reducing inference costs through offline training. However, its high training and storage costs, along with limited generalization ability, significantly restrict its practical adoption. To address these challenges, we propose Dynamic Parametric RAG (DyPRAG), a novel framework that leverages a lightweight parameter translator model to efficiently convert documents into parametric knowledge. DyPRAG not only reduces inference, training, and storage costs but also dynamically generates parametric knowledge, seamlessly enhancing the knowledge of LLMs and resolving knowledge conflicts in a plug-and-play manner at test-time. Extensive experiments on multiple datasets demonstrate the effectiveness and generalization capabilities of DyPRAG, offering a powerful and practical RAG paradigm which enables superior knowledge fusion and mitigates RAG hallucination in real-world applications. Our code is available at https://github.com/Trae1ounG/DyPRAG.

Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement

TL;DR

DyPRAG introduces a dynamic parametric retrieval-augmented generation framework that converts documents into test-time parametric knowledge via a lightweight hypernetwork translator, reducing online inference and offline training/storage costs. By learning a general mapping from document embeddings to adapter parameters, DyPRAG achieves competitive or superior performance to offline PRAG while offering strong generalization and a plug-and-play integration with contextual knowledge. The approach also enables DyPRAG-Combine, which fuses parametric and contextual knowledge to further reduce RAG hallucination and improve robustness, including in out-of-distribution scenarios. Extensive experiments across multiple benchmarks (2WQA, HQA, CWQ, CWQ, SQA, IIRC) and model scales (Qwen/LLaMA) demonstrate improved knowledge fusion, reduced hallucination, and favorable cost metrics. The work provides a practical RAG paradigm with broad applicability to real-world QA tasks and highlights directions for further interpretability and deployment considerations.

Abstract

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by retrieving relevant documents from external sources and incorporating them into the context. While it improves reliability by providing factual texts, it significantly increases inference costs as context length grows and introduces challenging issue of RAG hallucination, primarily caused by the lack of corresponding parametric knowledge in LLMs. An efficient solution is to enhance the knowledge of LLMs at test-time. Parametric RAG (PRAG) addresses this by embedding document into LLMs parameters to perform test-time knowledge enhancement, effectively reducing inference costs through offline training. However, its high training and storage costs, along with limited generalization ability, significantly restrict its practical adoption. To address these challenges, we propose Dynamic Parametric RAG (DyPRAG), a novel framework that leverages a lightweight parameter translator model to efficiently convert documents into parametric knowledge. DyPRAG not only reduces inference, training, and storage costs but also dynamically generates parametric knowledge, seamlessly enhancing the knowledge of LLMs and resolving knowledge conflicts in a plug-and-play manner at test-time. Extensive experiments on multiple datasets demonstrate the effectiveness and generalization capabilities of DyPRAG, offering a powerful and practical RAG paradigm which enables superior knowledge fusion and mitigates RAG hallucination in real-world applications. Our code is available at https://github.com/Trae1ounG/DyPRAG.

Paper Structure

This paper contains 62 sections, 7 equations, 17 figures, 8 tables.

Figures (17)

  • Figure 1: Compared to RAG and PRAG, the proposed DyPRAG offers multiple advantages, including lower inference, training and storage cost, strong generalization ability, and mitigation of RAG hallucination.
  • Figure 2: An illustration of the DyPRAG method. In the offline phase, Stage 1 follows the same parameterization process as PRAG to collect Doc-Param pairs. In Stage 2, we train the parameter translator $\mathcal{F}^\prime_\phi$ to learn the mapping function from documents to parameters. During the online Stage 3, the trained $\mathcal{F}^\prime_\phi$ dynamically generates LoRA modules to enhance LLMs knowledge at test-time.
  • Figure 3: Comparison between DyPRAG-Combine vs standard RAG judged by GPT-4o.
  • Figure 4: Performance of Qwen2.5-1.5B with varying number of injected documents.
  • Figure 5: Ablation study of varying training dataset size for DyPRAG. The backbone model is the LLaMA3.2-1B.
  • ...and 12 more figures