When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agentic Systems

Junwei Su; Chuan Wu

When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agentic Systems

Junwei Su, Chuan Wu

TL;DR

This work addresses when multi-agent reinforcement learning (MARL) outperforms single-agent RL (SARL) for agentic large language models (LLMs). By casting SARL and MARL in a PAC-learning framework and modeling task decomposition as dependent or independent subtasks, the authors derive explicit sample-complexity bounds and identify regimes where MARL offers advantages. They introduce task alignment via the alignment factor $α$ and provide bounds for imperfect alignment, showing MARL remains advantageous only under conditions linking subtask independence, parameter-efficiency, and alignment. Complemented by synthetic and GSM8K experiments, the results yield practical criteria to decide when to deploy MARL for LLM-based tasks, reconciling conflicting empirical observations and guiding efficient RLHF-type training. Overall, the paper advances a theory-driven roadmap for leveraging MARL in complex, task-decomposed LLM scenarios.

Abstract

Reinforcement Learning (RL) has emerged as a crucial method for training or fine-tuning large language models (LLMs), enabling adaptive, task-specific optimizations through interactive feedback. Multi-Agent Reinforcement Learning (MARL), in particular, offers a promising avenue by decomposing complex tasks into specialized subtasks learned by distinct interacting agents, potentially enhancing the ability and efficiency of LLM systems. However, theoretical insights regarding when and why MARL outperforms Single-Agent RL (SARL) remain limited, creating uncertainty in selecting the appropriate RL framework. In this paper, we address this critical gap by rigorously analyzing the comparative sample efficiency of MARL and SARL within the context of LLM. Leveraging the Probably Approximately Correct (PAC) framework, we formally define SARL and MARL setups for LLMs, derive explicit sample complexity bounds, and systematically characterize how task decomposition and alignment influence learning efficiency. Our results demonstrate that MARL improves sample complexity when tasks naturally decompose into independent subtasks, whereas dependent subtasks diminish MARL's comparative advantage. Additionally, we introduce and analyze the concept of task alignment, quantifying the trade-offs when enforcing independent task decomposition despite potential misalignments. These theoretical insights clarify empirical inconsistencies and provide practical criteria for deploying MARL strategies effectively in complex LLM scenarios.

When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agentic Systems

TL;DR

and provide bounds for imperfect alignment, showing MARL remains advantageous only under conditions linking subtask independence, parameter-efficiency, and alignment. Complemented by synthetic and GSM8K experiments, the results yield practical criteria to decide when to deploy MARL for LLM-based tasks, reconciling conflicting empirical observations and guiding efficient RLHF-type training. Overall, the paper advances a theory-driven roadmap for leveraging MARL in complex, task-decomposed LLM scenarios.

Abstract

Paper Structure (95 sections, 16 theorems, 130 equations, 1 figure, 1 table)

This paper contains 95 sections, 16 theorems, 130 equations, 1 figure, 1 table.

Introduction
Existing Gap.
Challenges.
Contribution.
Related Work
Agentic LLM System.
Theoretical Analysis of RL.
Preliminaries and Problem Formulation
Notation.
Single-Agent Reinforcement Learning (SARL) Formulation
Empirical optimization.
Multi-Agent Reinforcement Learning (MARL) Formulation
Specialized, turn-taking policies.
Task decomposition.
Empirical Optimization.
...and 80 more sections

Key Result

Theorem 4.1

Under Assumptions assump:bounded_reward--assump:finite_horizon, the SARL framework is PAC-learnable with sample complexity:

Figures (1)

Figure 1: Empirical comparison of SARL and MARL learning efficiency under (a) dependent and (b) independent task decompositions using synthetic tasks, and under (d) dependent and (e) independent decompositions on GSM8K. (c) illustrates the effect of task alignment on the relative learning efficiency of SARL and MARL, where “strong” indicates high alignment and “weak” indicates low alignment.

Theorems & Definitions (17)

Definition 1: PAC Learnability
Theorem 4.1: PAC sample complexity for SARL
Theorem 4.2: PAC Sample Complexity for MARL: Dependent Sub-task
Theorem 4.3: PAC Sample Complexity for MARL: Independent Sub-task
Proposition 4.4
Proposition 4.5
Theorem 4.6: PAC Sample Complexity for MARL
Proposition 4.7: Condition for MARL Advantage under Imperfect Alignment
Lemma D.1: TV control for autoregressive sequence
Lemma D.2: Parameter-to-value Lipschitzness
...and 7 more

When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agentic Systems

TL;DR

Abstract

When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agentic Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (17)