Table of Contents
Fetching ...

HeteRAG: A Heterogeneous Retrieval-augmented Generation Framework with Decoupled Knowledge Representations

Peiru Yang, Xintian Li, Zhiyang Hu, Jiapeng Wang, Jinhua Yin, Huili Wang, Lizhi He, Shuai Yang, Shangguang Wang, Yongfeng Huang, Tao Qi

TL;DR

The paper tackles the mismatch between retrieval and generation requirements in retrieval-augmented generation (RAG). It introduces HeteRAG, a heterogeneous RAG framework that decouples knowledge chunk representations: retrieval uses context-enriched multi-granular signals and global metadata to improve recall, while generation uses concise, standalone chunks for efficient, precise answers; an adaptive prompt-tuning strategy aligns the retriever with this heterogeneous setup. The authors formulate the retrieval-generation problem, present the dual-path knowledge representations, and employ soft-prompt prompts with contrastive learning (InfoNCE) to adapt the retriever to domain-specific corpora. Extensive experiments across BEIR datasets, multiple embeddings, and three LLMs show consistent retrieval and end-to-end QA gains, with robustness to chunk size and top-$k$ retrieval settings, highlighting practical benefits for real-world RAG deployments.

Abstract

Retrieval-augmented generation (RAG) methods can enhance the performance of LLMs by incorporating retrieved knowledge chunks into the generation process. In general, the retrieval and generation steps usually have different requirements for these knowledge chunks. The retrieval step benefits from comprehensive information to improve retrieval accuracy, whereas excessively long chunks may introduce redundant contextual information, thereby diminishing both the effectiveness and efficiency of the generation process. However, existing RAG methods typically employ identical representations of knowledge chunks for both retrieval and generation, resulting in suboptimal performance. In this paper, we propose a heterogeneous RAG framework (\myname) that decouples the representations of knowledge chunks for retrieval and generation, thereby enhancing the LLMs in both effectiveness and efficiency. Specifically, we utilize short chunks to represent knowledge to adapt the generation step and utilize the corresponding chunk with its contextual information from multi-granular views to enhance retrieval accuracy. We further introduce an adaptive prompt tuning method for the retrieval model to adapt the heterogeneous retrieval augmented generation process. Extensive experiments demonstrate that \myname achieves significant improvements compared to baselines.

HeteRAG: A Heterogeneous Retrieval-augmented Generation Framework with Decoupled Knowledge Representations

TL;DR

The paper tackles the mismatch between retrieval and generation requirements in retrieval-augmented generation (RAG). It introduces HeteRAG, a heterogeneous RAG framework that decouples knowledge chunk representations: retrieval uses context-enriched multi-granular signals and global metadata to improve recall, while generation uses concise, standalone chunks for efficient, precise answers; an adaptive prompt-tuning strategy aligns the retriever with this heterogeneous setup. The authors formulate the retrieval-generation problem, present the dual-path knowledge representations, and employ soft-prompt prompts with contrastive learning (InfoNCE) to adapt the retriever to domain-specific corpora. Extensive experiments across BEIR datasets, multiple embeddings, and three LLMs show consistent retrieval and end-to-end QA gains, with robustness to chunk size and top- retrieval settings, highlighting practical benefits for real-world RAG deployments.

Abstract

Retrieval-augmented generation (RAG) methods can enhance the performance of LLMs by incorporating retrieved knowledge chunks into the generation process. In general, the retrieval and generation steps usually have different requirements for these knowledge chunks. The retrieval step benefits from comprehensive information to improve retrieval accuracy, whereas excessively long chunks may introduce redundant contextual information, thereby diminishing both the effectiveness and efficiency of the generation process. However, existing RAG methods typically employ identical representations of knowledge chunks for both retrieval and generation, resulting in suboptimal performance. In this paper, we propose a heterogeneous RAG framework (\myname) that decouples the representations of knowledge chunks for retrieval and generation, thereby enhancing the LLMs in both effectiveness and efficiency. Specifically, we utilize short chunks to represent knowledge to adapt the generation step and utilize the corresponding chunk with its contextual information from multi-granular views to enhance retrieval accuracy. We further introduce an adaptive prompt tuning method for the retrieval model to adapt the heterogeneous retrieval augmented generation process. Extensive experiments demonstrate that \myname achieves significant improvements compared to baselines.

Paper Structure

This paper contains 16 sections, 7 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Naive RAG suffers retrieval inaccuracy due to identical chunk representations for retrieval/generation. The decoupled architecture of HeteRAG addresses this via contextual signal- and metadata-enhanced retrieval.
  • Figure 2: The overall framework of HeteRAG. The left shows naive RAG using identical representations of knowledge chunks for retrieval and generation. The right depicts HeteRAG's framework: retrieval incorporates global metadata and multi-granular context, while generation maintains standalone chunk usage.
  • Figure 3: Effect of contextual signals and structured metadata in HeteRAG framework. The ablation results show that both contribute significantly to the retrieval performance of HeteRAG.
  • Figure 4: The RAG results under varying retrieval numbers (top-$k$). The left side shows the results of three LLMs on the HotpotQA dataset as they vary with top-$k$, using both naive RAG and HeteRAG. The right side displays the performance variation of HeteRAG across five different datasets under various top-$k$ settings.