Graphon Mean-Field Subsampling for Cooperative Heterogeneous Multi-Agent Reinforcement Learning

Emile Anand; Richard Hoffmann; Sarah Liaw; Adam Wierman

Graphon Mean-Field Subsampling for Cooperative Heterogeneous Multi-Agent Reinforcement Learning

Emile Anand, Richard Hoffmann, Sarah Liaw, Adam Wierman

TL;DR

This work tackles scalable cooperative MARL with heterogeneous agent interactions by introducing Graphon Mean-Field Subsampling (GMFS). GMFS replaces full-population graphon aggregations with graphon-weighted subsampling of size $\kappa$, enabling a centralized $Q$-learning formulation over a $(\kappa+1)$-agent surrogate and decentralized execution where each agent samples $\kappa$ neighbors to form local graphon features. The authors prove that the resulting Bellman operators are $\gamma$-contractions and derive an optimality gap bound that decays as $O(1/\sqrt{\kappa})$, along with a polynomial-in-$\kappa$ sample complexity, demonstrating a substantial computational advantage over exhaustive graphon mean-field methods. Numerical experiments in robotic coordination illustrate monotonic performance gains with increasing $\kappa$, approaching the full graphon mean-field limit, thereby offering a principled, scalable path for heterogeneous MARL in large populations.

Abstract

Coordinating large populations of interacting agents is a central challenge in multi-agent reinforcement learning (MARL), where the size of the joint state-action space scales exponentially with the number of agents. Mean-field methods alleviate this burden by aggregating agent interactions, but these approaches assume homogeneous interactions. Recent graphon-based frameworks capture heterogeneity, but are computationally expensive as the number of agents grows. Therefore, we introduce $\texttt{GMFS}$, a $\textbf{G}$raphon $\textbf{M}$ean-$\textbf{F}$ield $\textbf{S}$ubsampling framework for scalable cooperative MARL with heterogeneous agent interactions. By subsampling $κ$ agents according to interaction strength, we approximate the graphon-weighted mean-field and learn a policy with sample complexity $\mathrm{poly}(κ)$ and optimality gap $O(1/\sqrtκ)$. We verify our theory with numerical simulations in robotic coordination, showing that $\texttt{GMFS}$ achieves near-optimal performance.

Graphon Mean-Field Subsampling for Cooperative Heterogeneous Multi-Agent Reinforcement Learning

TL;DR

, enabling a centralized

-learning formulation over a

-agent surrogate and decentralized execution where each agent samples

neighbors to form local graphon features. The authors prove that the resulting Bellman operators are

-contractions and derive an optimality gap bound that decays as

, along with a polynomial-in-

sample complexity, demonstrating a substantial computational advantage over exhaustive graphon mean-field methods. Numerical experiments in robotic coordination illustrate monotonic performance gains with increasing

, approaching the full graphon mean-field limit, thereby offering a principled, scalable path for heterogeneous MARL in large populations.

Abstract

, a

raphon

ean-

ield

ubsampling framework for scalable cooperative MARL with heterogeneous agent interactions. By subsampling

agents according to interaction strength, we approximate the graphon-weighted mean-field and learn a policy with sample complexity

and optimality gap

. We verify our theory with numerical simulations in robotic coordination, showing that

achieves near-optimal performance.

Paper Structure (26 sections, 43 theorems, 176 equations, 5 figures, 1 table, 3 algorithms)

This paper contains 26 sections, 43 theorems, 176 equations, 5 figures, 1 table, 3 algorithms.

Introduction
Contributions
Related Literature
Preliminaries
Problem Formulation
Graphon Mean-Field Subsampling (GMFS)
Theoretical Guarantees and Analysis
Proof Outline
Conclusion
Numerical Simulations and Additional Motivating Examples
Motivating Example: Cooperative Autonomous Driving
Motivating Example: Cooperative Robot Coordination Task
Motivating Example: Energy Distribution for a Smart Grid
Evaluation on Cooperative Robot Coordination Task
Experimental Setup
...and 11 more sections

Key Result

Theorem 4.1

For all states $s\in\mathcal{S}$ and graphon state-aggregates $g\in\mathcal{G}$, if $T\geq \frac{1}{1-\gamma}\log\frac{\|r_\ell\|_\infty \sqrt{\kappa}}{1-\gamma}$, then

Figures (5)

Figure 1: Graphon mean-field systems with distance-decay interactions. (a) Warehouse robots collaborate to transport a payload, where robot $i$ uses $\kappa=8$ subsampled neighbors with interaction strength $W(x_i,x_j)$ indicated by line thickness. (b) Traffic vehicles coordinate using graphon-weighted aggregates $g_i$ of the population, where interaction strength decays with distance in latent position space $\alpha \in [0,1]$.
Figure 2: Schematic of Graphon Mean-Field Sampling. (Left) A continuous graphon $W(x,y)$ represents the infinite-population limit of non-uniform interactions. (Middle) The graphon with deterministic latent positions $\{\alpha_i\}_{i=1}^n\subset [0,1]$ induces a complete weighted interaction graph on $n$ agents with edge weights $w_{ij} = W(\alpha_i, \alpha_j)$ and $w_{ii}=0$. These weights specify the intensity with which agent $i$ aggregates neighbor states into its mean-field features used for learning and control. (Right) Each agent approximates its graphon-weighted neighborhood statistics by sampling a small set of agents (random subsample of $\kappa = 5$) according to the normalized weights $\bar{w}_{ij}$ .
Figure 3: Performance-scalability tradeoff of GMFS. (Left) GMFS rapidly achieves near-optimal performance in the robotics coordination task starting from around $\kappa = 8$, approaching the full graphon mean-field baseline at $\kappa = 24$ (which corresponds to the optimal solution obtained without sampling) hu2022graphonmeanfieldcontrolcooperativefabian2022meanfieldgamesweighted. (Right) The computational cost, measured by the number of entries in the discrete neighborhood state space $\mathcal{Z}_\kappa$, grows polynomially in $\kappa$.
Figure 4: Perception time‑evolution comparison for the focal agent (center of the $5 \times 5$ grid) under the radial graphon. Rows correspond to $\kappa \in \{2, 8, 24\}$, while columns correspond to time horizons $t \in \{1, 30, 60, 100\}$. Each panel aggregates the focal agent’s sampled neighbors up to time $t$, with the dashed circle indicating the true interaction ball. As $\kappa$ increases, the empirical neighborhood density converges faster and more uniformly to the support of the radial graphon.
Figure 5: Visualization of a radial graphon. We note that generative AI was used to refine the aesthetics of this figure.

Theorems & Definitions (97)

Definition 2.1: Graphon-weighted neighborhood state-action feature for agent $i$
Definition 2.2: $\epsilon$-optimal policy
Definition 3.2: Graphon-weighted Subsampling
Definition 3.3: Sampled neighborhood aggregates
Definition 3.8: Bellman operator $\mathcal{T}$
Definition 3.9: Sampled Bellman operator $\hat{\mathcal{T}}_\kappa$
Definition 3.10: Empirical sampled operator $\widehat{\mathcal{T}}_{\kappa,m}$
Theorem 4.1
Lemma 4.2: Controlling the Bellman Noise
Theorem 4.3
...and 87 more

Graphon Mean-Field Subsampling for Cooperative Heterogeneous Multi-Agent Reinforcement Learning

TL;DR

Abstract

Graphon Mean-Field Subsampling for Cooperative Heterogeneous Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (97)