Table of Contents
Fetching ...

Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration

Ran Xu, Wenqi Shi, Yuchen Zhuang, Yue Yu, Joyce C. Ho, Haoyu Wang, Carl Yang

TL;DR

Collab-RAG tackles the challenge of complex, multi-hop QA in retrieval-augmented generation by introducing a collaboration between a white-box small language model as a query decomposer and a black-box large language model as a context reader. The approach trains the SLM through iterative preference optimization using feedback from an affordable LLM (GPT-4o-mini), avoiding costly distillation from frontier models. Empirical results across five datasets show consistent improvements over both black-box-only and SLM-based baselines, with notable efficiency: a 3B SLM can surpass a 32B frozen LLM in decomposition, while an 8B decomposer yields strong gains overall. The method offers a scalable, generalizable pathway to enhance complex QA in RAG, with potential extensions to online reinforcement learning.

Abstract

Retrieval-Augmented Generation (RAG) systems often struggle to handle multi-hop question-answering tasks accurately due to irrelevant context retrieval and limited complex reasoning capabilities. We introduce Collab-RAG, a collaborative training framework that leverages mutual enhancement between a white-box small language model (SLM) and a blackbox large language model (LLM) for RAG. Specifically, the SLM decomposes complex queries into simpler sub-questions, thus enhancing the accuracy of the retrieval and facilitating more effective reasoning by the black-box LLM. Concurrently, the black-box LLM provides feedback signals to improve the SLM's decomposition capability. We observe that Collab-RAG relies solely on supervision from an affordable black-box LLM without additional distillation from frontier LLMs, yet demonstrates strong generalization across multiple black-box LLMs. Experimental evaluations across five multi-hop QA datasets demonstrate that Collab-RAG substantially outperforms existing black-box-only and SLM fine-tuning baselines by 1.8%-14.2% on average. In particular, our fine-tuned 3B SLM surpasses a frozen 32B LLM in question decomposition, highlighting the efficiency of Collab-RAG in improving reasoning and retrieval for complex questions. The code of Collab-RAG is available on https://github.com/ritaranx/Collab-RAG/.

Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration

TL;DR

Collab-RAG tackles the challenge of complex, multi-hop QA in retrieval-augmented generation by introducing a collaboration between a white-box small language model as a query decomposer and a black-box large language model as a context reader. The approach trains the SLM through iterative preference optimization using feedback from an affordable LLM (GPT-4o-mini), avoiding costly distillation from frontier models. Empirical results across five datasets show consistent improvements over both black-box-only and SLM-based baselines, with notable efficiency: a 3B SLM can surpass a 32B frozen LLM in decomposition, while an 8B decomposer yields strong gains overall. The method offers a scalable, generalizable pathway to enhance complex QA in RAG, with potential extensions to online reinforcement learning.

Abstract

Retrieval-Augmented Generation (RAG) systems often struggle to handle multi-hop question-answering tasks accurately due to irrelevant context retrieval and limited complex reasoning capabilities. We introduce Collab-RAG, a collaborative training framework that leverages mutual enhancement between a white-box small language model (SLM) and a blackbox large language model (LLM) for RAG. Specifically, the SLM decomposes complex queries into simpler sub-questions, thus enhancing the accuracy of the retrieval and facilitating more effective reasoning by the black-box LLM. Concurrently, the black-box LLM provides feedback signals to improve the SLM's decomposition capability. We observe that Collab-RAG relies solely on supervision from an affordable black-box LLM without additional distillation from frontier LLMs, yet demonstrates strong generalization across multiple black-box LLMs. Experimental evaluations across five multi-hop QA datasets demonstrate that Collab-RAG substantially outperforms existing black-box-only and SLM fine-tuning baselines by 1.8%-14.2% on average. In particular, our fine-tuned 3B SLM surpasses a frozen 32B LLM in question decomposition, highlighting the efficiency of Collab-RAG in improving reasoning and retrieval for complex questions. The code of Collab-RAG is available on https://github.com/ritaranx/Collab-RAG/.

Paper Structure

This paper contains 23 sections, 6 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Comparison of various LLM-based RAG pipelines. Collab-RAG fosters collaboration between the SLM query decomposer and the LLM reader, allowing them to enhance each other.
  • Figure 2: The iterative training framework of Collab-RAG. The SLM updates its parameters based on the generation quality of the LLM reader. The above process is conducted over multiple iterations to gradually improve SLM's decomposition capability.
  • Figure 3: Different LLM Readers
  • Figure 4: Collab-RAG v.s. Distillation
  • Figure 5: Additional Studies. GPT-4o-mini as the default LLM reader.