Table of Contents
Fetching ...

Graphon Mean-Field Subsampling for Cooperative Heterogeneous Multi-Agent Reinforcement Learning

Emile Anand, Richard Hoffmann, Sarah Liaw, Adam Wierman

TL;DR

This work tackles scalable cooperative MARL with heterogeneous agent interactions by introducing Graphon Mean-Field Subsampling (GMFS). GMFS replaces full-population graphon aggregations with graphon-weighted subsampling of size $\kappa$, enabling a centralized $Q$-learning formulation over a $(\kappa+1)$-agent surrogate and decentralized execution where each agent samples $\kappa$ neighbors to form local graphon features. The authors prove that the resulting Bellman operators are $\gamma$-contractions and derive an optimality gap bound that decays as $O(1/\sqrt{\kappa})$, along with a polynomial-in-$\kappa$ sample complexity, demonstrating a substantial computational advantage over exhaustive graphon mean-field methods. Numerical experiments in robotic coordination illustrate monotonic performance gains with increasing $\kappa$, approaching the full graphon mean-field limit, thereby offering a principled, scalable path for heterogeneous MARL in large populations.

Abstract

Coordinating large populations of interacting agents is a central challenge in multi-agent reinforcement learning (MARL), where the size of the joint state-action space scales exponentially with the number of agents. Mean-field methods alleviate this burden by aggregating agent interactions, but these approaches assume homogeneous interactions. Recent graphon-based frameworks capture heterogeneity, but are computationally expensive as the number of agents grows. Therefore, we introduce $\texttt{GMFS}$, a $\textbf{G}$raphon $\textbf{M}$ean-$\textbf{F}$ield $\textbf{S}$ubsampling framework for scalable cooperative MARL with heterogeneous agent interactions. By subsampling $κ$ agents according to interaction strength, we approximate the graphon-weighted mean-field and learn a policy with sample complexity $\mathrm{poly}(κ)$ and optimality gap $O(1/\sqrtκ)$. We verify our theory with numerical simulations in robotic coordination, showing that $\texttt{GMFS}$ achieves near-optimal performance.

Graphon Mean-Field Subsampling for Cooperative Heterogeneous Multi-Agent Reinforcement Learning

TL;DR

This work tackles scalable cooperative MARL with heterogeneous agent interactions by introducing Graphon Mean-Field Subsampling (GMFS). GMFS replaces full-population graphon aggregations with graphon-weighted subsampling of size , enabling a centralized -learning formulation over a -agent surrogate and decentralized execution where each agent samples neighbors to form local graphon features. The authors prove that the resulting Bellman operators are -contractions and derive an optimality gap bound that decays as , along with a polynomial-in- sample complexity, demonstrating a substantial computational advantage over exhaustive graphon mean-field methods. Numerical experiments in robotic coordination illustrate monotonic performance gains with increasing , approaching the full graphon mean-field limit, thereby offering a principled, scalable path for heterogeneous MARL in large populations.

Abstract

Coordinating large populations of interacting agents is a central challenge in multi-agent reinforcement learning (MARL), where the size of the joint state-action space scales exponentially with the number of agents. Mean-field methods alleviate this burden by aggregating agent interactions, but these approaches assume homogeneous interactions. Recent graphon-based frameworks capture heterogeneity, but are computationally expensive as the number of agents grows. Therefore, we introduce , a raphon ean-ield ubsampling framework for scalable cooperative MARL with heterogeneous agent interactions. By subsampling agents according to interaction strength, we approximate the graphon-weighted mean-field and learn a policy with sample complexity and optimality gap . We verify our theory with numerical simulations in robotic coordination, showing that achieves near-optimal performance.
Paper Structure (26 sections, 43 theorems, 176 equations, 5 figures, 1 table, 3 algorithms)

This paper contains 26 sections, 43 theorems, 176 equations, 5 figures, 1 table, 3 algorithms.

Key Result

Theorem 4.1

For all states $s\in\mathcal{S}$ and graphon state-aggregates $g\in\mathcal{G}$, if $T\geq \frac{1}{1-\gamma}\log\frac{\|r_\ell\|_\infty \sqrt{\kappa}}{1-\gamma}$, then

Figures (5)

  • Figure 1: Graphon mean-field systems with distance-decay interactions. (a) Warehouse robots collaborate to transport a payload, where robot $i$ uses $\kappa=8$ subsampled neighbors with interaction strength $W(x_i,x_j)$ indicated by line thickness. (b) Traffic vehicles coordinate using graphon-weighted aggregates $g_i$ of the population, where interaction strength decays with distance in latent position space $\alpha \in [0,1]$.
  • Figure 2: Schematic of Graphon Mean-Field Sampling. (Left) A continuous graphon $W(x,y)$ represents the infinite-population limit of non-uniform interactions. (Middle) The graphon with deterministic latent positions $\{\alpha_i\}_{i=1}^n\subset [0,1]$ induces a complete weighted interaction graph on $n$ agents with edge weights $w_{ij} = W(\alpha_i, \alpha_j)$ and $w_{ii}=0$. These weights specify the intensity with which agent $i$ aggregates neighbor states into its mean-field features used for learning and control. (Right) Each agent approximates its graphon-weighted neighborhood statistics by sampling a small set of agents (random subsample of $\kappa = 5$) according to the normalized weights $\bar{w}_{ij}$ .
  • Figure 3: Performance-scalability tradeoff of GMFS. (Left) GMFS rapidly achieves near-optimal performance in the robotics coordination task starting from around $\kappa = 8$, approaching the full graphon mean-field baseline at $\kappa = 24$ (which corresponds to the optimal solution obtained without sampling) hu2022graphonmeanfieldcontrolcooperativefabian2022meanfieldgamesweighted. (Right) The computational cost, measured by the number of entries in the discrete neighborhood state space $\mathcal{Z}_\kappa$, grows polynomially in $\kappa$.
  • Figure 4: Perception time‑evolution comparison for the focal agent (center of the $5 \times 5$ grid) under the radial graphon. Rows correspond to $\kappa \in \{2, 8, 24\}$, while columns correspond to time horizons $t \in \{1, 30, 60, 100\}$. Each panel aggregates the focal agent’s sampled neighbors up to time $t$, with the dashed circle indicating the true interaction ball. As $\kappa$ increases, the empirical neighborhood density converges faster and more uniformly to the support of the radial graphon.
  • Figure 5: Visualization of a radial graphon. We note that generative AI was used to refine the aesthetics of this figure.

Theorems & Definitions (97)

  • Definition 2.1: Graphon-weighted neighborhood state-action feature for agent $i$
  • Definition 2.2: $\epsilon$-optimal policy
  • Definition 3.2: Graphon-weighted Subsampling
  • Definition 3.3: Sampled neighborhood aggregates
  • Definition 3.8: Bellman operator $\mathcal{T}$
  • Definition 3.9: Sampled Bellman operator $\hat{\mathcal{T}}_\kappa$
  • Definition 3.10: Empirical sampled operator $\widehat{\mathcal{T}}_{\kappa,m}$
  • Theorem 4.1
  • Lemma 4.2: Controlling the Bellman Noise
  • Theorem 4.3
  • ...and 87 more