Table of Contents
Fetching ...

C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation

Guoxin Chen, Minpeng Liao, Peiying Yu, Dingmin Wang, Zile Qiao, Chao Yang, Xin Zhao, Kai Fan

TL;DR

C-3PO introduces a proxy-centric alignment framework that coordinates retrieval, planning, and answer generation through a lightweight multi-agent system, avoiding modifications to existing retrievers or LLMs. It employs a tree-structured rollout and Monte Carlo credit assignment within cooperative MARL to distribute system-level rewards across specialized agents and optimize the entire RAG pipeline end-to-end using PPO. The approach yields substantial performance gains on both in-domain and out-of-distribution tasks, with strong generalization to unseen retrievers and LLMs and practical edge-deployment efficiency. Overall, C-3PO demonstrates a robust, plug-and-play solution for improving RAG systems by rethinking component coordination rather than component tuning alone.

Abstract

Retrieval-augmented generation (RAG) systems face a fundamental challenge in aligning independently developed retrievers and large language models (LLMs). Existing approaches typically involve modifying either component or introducing simple intermediate modules, resulting in practical limitations and sub-optimal performance. Inspired by human search behavior -- typically involving a back-and-forth process of proposing search queries and reviewing documents, we propose C-3PO, a proxy-centric framework that facilitates communication between retrievers and LLMs through a lightweight multi-agent system. Our framework implements three specialized agents that collaboratively optimize the entire RAG pipeline without altering the retriever and LLMs. These agents work together to assess the need for retrieval, generate effective queries, and select information suitable for the LLMs. To enable effective multi-agent coordination, we develop a tree-structured rollout approach for reward credit assignment in reinforcement learning. Extensive experiments in both in-domain and out-of-distribution scenarios demonstrate that C-3PO significantly enhances RAG performance while maintaining plug-and-play flexibility and superior generalization capabilities.

C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation

TL;DR

C-3PO introduces a proxy-centric alignment framework that coordinates retrieval, planning, and answer generation through a lightweight multi-agent system, avoiding modifications to existing retrievers or LLMs. It employs a tree-structured rollout and Monte Carlo credit assignment within cooperative MARL to distribute system-level rewards across specialized agents and optimize the entire RAG pipeline end-to-end using PPO. The approach yields substantial performance gains on both in-domain and out-of-distribution tasks, with strong generalization to unseen retrievers and LLMs and practical edge-deployment efficiency. Overall, C-3PO demonstrates a robust, plug-and-play solution for improving RAG systems by rethinking component coordination rather than component tuning alone.

Abstract

Retrieval-augmented generation (RAG) systems face a fundamental challenge in aligning independently developed retrievers and large language models (LLMs). Existing approaches typically involve modifying either component or introducing simple intermediate modules, resulting in practical limitations and sub-optimal performance. Inspired by human search behavior -- typically involving a back-and-forth process of proposing search queries and reviewing documents, we propose C-3PO, a proxy-centric framework that facilitates communication between retrievers and LLMs through a lightweight multi-agent system. Our framework implements three specialized agents that collaboratively optimize the entire RAG pipeline without altering the retriever and LLMs. These agents work together to assess the need for retrieval, generate effective queries, and select information suitable for the LLMs. To enable effective multi-agent coordination, we develop a tree-structured rollout approach for reward credit assignment in reinforcement learning. Extensive experiments in both in-domain and out-of-distribution scenarios demonstrate that C-3PO significantly enhances RAG performance while maintaining plug-and-play flexibility and superior generalization capabilities.

Paper Structure

This paper contains 31 sections, 11 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overall framework of C-3PO. (Upper left) Essential cognitive capabilities required for effective RAG system interaction in human-guided alignment. (Upper right) Our proxy-centric alignment simulates these human-like interaction through a lightweight multi-agent system with collaborative strategies. (Bottom) The end-to-end optimization pipeline for our multi-agent system.
  • Figure 2: Ablation Study.
  • Figure 3: Performance and Efficiency Comparison.
  • Figure 4: Average Accuracy of C-3PO during RL training.
  • Figure 5: Strategy Ratio in RL training process.
  • ...and 1 more figures