HeteRAG: A Heterogeneous Retrieval-augmented Generation Framework with Decoupled Knowledge Representations
Peiru Yang, Xintian Li, Zhiyang Hu, Jiapeng Wang, Jinhua Yin, Huili Wang, Lizhi He, Shuai Yang, Shangguang Wang, Yongfeng Huang, Tao Qi
TL;DR
The paper tackles the mismatch between retrieval and generation requirements in retrieval-augmented generation (RAG). It introduces HeteRAG, a heterogeneous RAG framework that decouples knowledge chunk representations: retrieval uses context-enriched multi-granular signals and global metadata to improve recall, while generation uses concise, standalone chunks for efficient, precise answers; an adaptive prompt-tuning strategy aligns the retriever with this heterogeneous setup. The authors formulate the retrieval-generation problem, present the dual-path knowledge representations, and employ soft-prompt prompts with contrastive learning (InfoNCE) to adapt the retriever to domain-specific corpora. Extensive experiments across BEIR datasets, multiple embeddings, and three LLMs show consistent retrieval and end-to-end QA gains, with robustness to chunk size and top-$k$ retrieval settings, highlighting practical benefits for real-world RAG deployments.
Abstract
Retrieval-augmented generation (RAG) methods can enhance the performance of LLMs by incorporating retrieved knowledge chunks into the generation process. In general, the retrieval and generation steps usually have different requirements for these knowledge chunks. The retrieval step benefits from comprehensive information to improve retrieval accuracy, whereas excessively long chunks may introduce redundant contextual information, thereby diminishing both the effectiveness and efficiency of the generation process. However, existing RAG methods typically employ identical representations of knowledge chunks for both retrieval and generation, resulting in suboptimal performance. In this paper, we propose a heterogeneous RAG framework (\myname) that decouples the representations of knowledge chunks for retrieval and generation, thereby enhancing the LLMs in both effectiveness and efficiency. Specifically, we utilize short chunks to represent knowledge to adapt the generation step and utilize the corresponding chunk with its contextual information from multi-granular views to enhance retrieval accuracy. We further introduce an adaptive prompt tuning method for the retrieval model to adapt the heterogeneous retrieval augmented generation process. Extensive experiments demonstrate that \myname achieves significant improvements compared to baselines.
