Table of Contents
Fetching ...

M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions

Zheng Wang, Shu Xian Teo, Jieer Ouyang, Yongjun Xu, Wei Shi

TL;DR

M-RAG tackles the noise and inefficiency of full-database retrieval in retrieval-augmented generation by partitioning the memory store and coordinating two RL agents. Agent-S selects a partition while Agent-R refines memories within that partition, with end-to-end training via multi-agent DQN to optimize generation quality. Across seven datasets and three generation tasks, M-RAG delivers meaningful gains (up to 11% in summarization, 8% in translation, and 12% in dialogue) and demonstrates the value of partition-aware retrieval. The approach also highlights practical benefits for indexing, privacy, and distributed processing, offering a scalable pathway to improved grounding in LLM-driven applications.

Abstract

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by retrieving relevant memories from an external database. However, existing RAG methods typically organize all memories in a whole database, potentially limiting focus on crucial memories and introducing noise. In this paper, we introduce a multiple partition paradigm for RAG (called M-RAG), where each database partition serves as a basic unit for RAG execution. Based on this paradigm, we propose a novel framework that leverages LLMs with Multi-Agent Reinforcement Learning to optimize different language generation tasks explicitly. Through comprehensive experiments conducted on seven datasets, spanning three language generation tasks and involving three distinct language model architectures, we confirm that M-RAG consistently outperforms various baseline methods, achieving improvements of 11%, 8%, and 12% for text summarization, machine translation, and dialogue generation, respectively.

M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions

TL;DR

M-RAG tackles the noise and inefficiency of full-database retrieval in retrieval-augmented generation by partitioning the memory store and coordinating two RL agents. Agent-S selects a partition while Agent-R refines memories within that partition, with end-to-end training via multi-agent DQN to optimize generation quality. Across seven datasets and three generation tasks, M-RAG delivers meaningful gains (up to 11% in summarization, 8% in translation, and 12% in dialogue) and demonstrates the value of partition-aware retrieval. The approach also highlights practical benefits for indexing, privacy, and distributed processing, offering a scalable pathway to improved grounding in LLM-driven applications.

Abstract

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by retrieving relevant memories from an external database. However, existing RAG methods typically organize all memories in a whole database, potentially limiting focus on crucial memories and introducing noise. In this paper, we introduce a multiple partition paradigm for RAG (called M-RAG), where each database partition serves as a basic unit for RAG execution. Based on this paradigm, we propose a novel framework that leverages LLMs with Multi-Agent Reinforcement Learning to optimize different language generation tasks explicitly. Through comprehensive experiments conducted on seven datasets, spanning three language generation tasks and involving three distinct language model architectures, we confirm that M-RAG consistently outperforms various baseline methods, achieving improvements of 11%, 8%, and 12% for text summarization, machine translation, and dialogue generation, respectively.
Paper Structure (14 sections, 7 equations, 2 figures, 8 tables, 1 algorithm)

This paper contains 14 sections, 7 equations, 2 figures, 8 tables, 1 algorithm.

Figures (2)

  • Figure 1: Comparison with database partitioning strategies for language generation tasks.
  • Figure 2: Illustration of M-RAG training in a summarization task: The M-RAG initiates training with multiple partitions (Section \ref{['sec:partition']}), it then selects a partition to perform retrieval via Agent-S (Section \ref{['sec:agent-s']}), and refines the memories within the selected partition via Agent-R (Section \ref{['sec:agent-r']}). Both agents are collaboratively trained to enhance generation capabilities through multi-agent reinforcement learning (Section \ref{['sec:mrag']}). For inference, it includes elements (1), (2), (3), (4), (11), and (12).