CoCoA: Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy
Yi Jiang, Sendong Zhao, Jianbo Li, Haochun Wang, Lizhe Zhang, Yan Liu, Bing Qin
TL;DR
CoCoA tackles the limited synergy between a model's internal parametric knowledge and external retrieved data in retrieval-augmented generation. It introduces CoCoA-zero, a two-stage multi-agent framework that first induces internal and external knowledge and then performs high-level decision making, followed by long-chain training to distill this collaborative reasoning into a fine-tuned LLM. The approach yields state-of-the-art results on open-domain and multi-hop QA benchmarks, with thorough ablations showing the importance of each component and the benefits of long-chain optimization. The work demonstrates that explicit, structured collaboration between internal and external knowledge, learned through supervised long-form trajectories and preference optimization, can significantly enhance knowledge-intensive reasoning and generalize beyond QA tasks, albeit with higher inference cost. Its findings have practical implications for building more robust and interpretable RAG systems that leverage both embedded knowledge and retrieved information.
Abstract
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs), especially for knowledge-intensive tasks. Despite its advantages, current RAG methods often struggle to fully exploit knowledge during generation. In particular, the synergy between the model's internal parametric knowledge and external retrieved knowledge remains limited. Retrieved contents may sometimes mislead generation, while certain generated content can guide the model toward more accurate outputs. In this work, we propose Collaborative Chain-of-Agents, a framework designed to enhance explicitly synergy over both parametric and retrieved knowledge. Specifically, we first introduce CoCoA-zero, a multi-agent RAG framework that first performs conditional knowledge induction and then reasons answers. Building on this, we develop CoCoA, a long-chain training strategy that synthesizes extended multi-agent reasoning trajectories from CoCoA-zero to fine-tune the LLM. This strategy enhances the model's capability to explicitly integrate and jointly leverage parametric and retrieved knowledge. Experimental results demonstrate the superiority of CoCoA in open-domain QA and multi-hop QA.
