Table of Contents
Fetching ...

CoRAG: Collaborative Retrieval-Augmented Generation

Aashiq Muhamed, Mona Diab, Virginia Smith

TL;DR

CoRAG is introduced, a framework extending RAG to collaborative settings, where clients jointly train a shared model using a collaborative passage store, and CRAB, a benchmark for collaborative homogeneous open-domain question answering is introduced to evaluate CoRAG.

Abstract

Retrieval-Augmented Generation (RAG) models excel in knowledge-intensive tasks, especially under few-shot learning constraints. We introduce CoRAG, a framework extending RAG to collaborative settings, where clients jointly train a shared model using a collaborative passage store. To evaluate CoRAG, we introduce CRAB, a benchmark for collaborative homogeneous open-domain question answering. Our experiments demonstrate that CoRAG consistently outperforms both parametric collaborative learning methods and locally trained RAG models in low-resource scenarios. Further analysis reveals the critical importance of relevant passages within the shared store, the surprising benefits of incorporating irrelevant passages, and the potential for hard negatives to negatively impact performance. This introduces a novel consideration in collaborative RAG: the trade-off between leveraging a collectively enriched knowledge base and the potential risk of incorporating detrimental passages from other clients. Our findings underscore the viability of CoRAG, while also highlighting key design challenges and promising avenues for future research.

CoRAG: Collaborative Retrieval-Augmented Generation

TL;DR

CoRAG is introduced, a framework extending RAG to collaborative settings, where clients jointly train a shared model using a collaborative passage store, and CRAB, a benchmark for collaborative homogeneous open-domain question answering is introduced to evaluate CoRAG.

Abstract

Retrieval-Augmented Generation (RAG) models excel in knowledge-intensive tasks, especially under few-shot learning constraints. We introduce CoRAG, a framework extending RAG to collaborative settings, where clients jointly train a shared model using a collaborative passage store. To evaluate CoRAG, we introduce CRAB, a benchmark for collaborative homogeneous open-domain question answering. Our experiments demonstrate that CoRAG consistently outperforms both parametric collaborative learning methods and locally trained RAG models in low-resource scenarios. Further analysis reveals the critical importance of relevant passages within the shared store, the surprising benefits of incorporating irrelevant passages, and the potential for hard negatives to negatively impact performance. This introduces a novel consideration in collaborative RAG: the trade-off between leveraging a collectively enriched knowledge base and the potential risk of incorporating detrimental passages from other clients. Our findings underscore the viability of CoRAG, while also highlighting key design challenges and promising avenues for future research.

Paper Structure

This paper contains 36 sections, 10 equations, 2 figures, 6 tables, 1 algorithm.

Figures (2)

  • Figure 1: Performance of Flan-T5, RAG (Local), and CoRAG on CRAB. CoRAG consistently outperforms Flan-T5 across training configurations. Performance gap between CoRAG and baselines widens as training samples per client decreases.
  • Figure 2: 64-shot EM scores on the CRAB benchmark. L is Local and CL is Collaborative. CoRAG consistently improves over RAG (Local) across all clients (1-8) and store choices. Improvement varies depending on the composition of passage store.

Theorems & Definitions (4)

  • Definition G.1: The CoRAG Participation Game
  • Definition G.2: Nash Equilibria in the CoRAG Game
  • Definition G.3: Reward Allocation Mechanism
  • Definition G.4: CoRAG Game with Incentive Mechanisms