Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning

Emile Anand; Ishani Karmarkar; Guannan Qu

Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning

Emile Anand, Ishani Karmarkar, Guannan Qu

TL;DR

This work tackles the MARL curse of dimensionality by introducing SUBSAMPLE-MFQ, which learns policies for a cooperative system with one global and $n$ local agents by subsampling a subset of size $k$. By performing mean-field Q-learning on the $k$-agent surrogate and executing with a randomized policy that samples subsets at runtime, the method achieves a high-probability bound showing the policy’s optimality gap decays at a rate of $\tilde{O}(1/\sqrt{k})$, independent of $n$. The key analytical ingredients are a Lipschitz bound in total variation, a sampling-without-replacement concentration bound for the empirical distribution, and an adapted performance-difference lemma, enabling a polynomial-in-$k$ learning procedure with scalable guarantees. The approach yields exponential speedups in $n$ when $k=O(\log n)$ and extends to off-policy and linear-MDP-like non-tabular settings, signaling practical scalability gains for cooperative MARL and potential CTDE-type deployments. Overall, the paper contributes a theoretically solid subsampling framework that preserves near-optimality while dramatically reducing sample and computational complexity in large-scale multi-agent systems.

Abstract

Designing efficient algorithms for multi-agent reinforcement learning (MARL) is fundamentally challenging because the size of the joint state and action spaces grows exponentially in the number of agents. These difficulties are exacerbated when balancing sequential global decision-making with local agent interactions. In this work, we propose a new algorithm $\texttt{SUBSAMPLE-MFQ}$ ($\textbf{Subsample}$-$\textbf{M}$ean-$\textbf{F}$ield-$\textbf{Q}$-learning) and a decentralized randomized policy for a system with $n$ agents. For any $k\leq n$, our algorithm learns a policy for the system in time polynomial in $k$. We prove that this learned policy converges to the optimal policy on the order of $\tilde{O}(1/\sqrt{k})$ as the number of subsampled agents $k$ increases. In particular, this bound is independent of the number of agents $n$.

Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning

TL;DR

This work tackles the MARL curse of dimensionality by introducing SUBSAMPLE-MFQ, which learns policies for a cooperative system with one global and

local agents by subsampling a subset of size

. By performing mean-field Q-learning on the

-agent surrogate and executing with a randomized policy that samples subsets at runtime, the method achieves a high-probability bound showing the policy’s optimality gap decays at a rate of

, independent of

. The key analytical ingredients are a Lipschitz bound in total variation, a sampling-without-replacement concentration bound for the empirical distribution, and an adapted performance-difference lemma, enabling a polynomial-in-

learning procedure with scalable guarantees. The approach yields exponential speedups in

when

and extends to off-policy and linear-MDP-like non-tabular settings, signaling practical scalability gains for cooperative MARL and potential CTDE-type deployments. Overall, the paper contributes a theoretically solid subsampling framework that preserves near-optimality while dramatically reducing sample and computational complexity in large-scale multi-agent systems.

Abstract

(

ean-

ield-

-learning) and a decentralized randomized policy for a system with

agents. For any

, our algorithm learns a policy for the system in time polynomial in

. We prove that this learned policy converges to the optimal policy on the order of

as the number of subsampled agents

increases. In particular, this bound is independent of the number of agents

Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning

TL;DR

Abstract

Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (117)