A Unified Framework for Locality in Scalable MARL

Sourav Chakraborty; Amit Kiran Rege; Claire Monteleoni; Lijun Chen

A Unified Framework for Locality in Scalable MARL

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

TL;DR

It is established that locality can also be a policy-dependent phenomenon, and a novel decomposition of the policy-induced interdependence matrix is made, revealing that locality can be induced by a smooth policy even when the environment is strongly action-coupled, exposing a fundamental locality-optimality tradeoff.

Abstract

Scalable Multi-Agent Reinforcement Learning (MARL) is fundamentally challenged by the curse of dimensionality. A common solution is to exploit locality, which hinges on an Exponential Decay Property (EDP) of the value function. However, existing conditions that guarantee the EDP are often conservative, as they are based on worst-case, environment-only bounds (e.g., supremums over actions) and fail to capture the regularizing effect of the policy itself. In this work, we establish that locality can also be a \emph{policy-dependent} phenomenon. Our central contribution is a novel decomposition of the policy-induced interdependence matrix, $H^π$, which decouples the environment's sensitivity to state ($E^{\mathrm{s}}$) and action ($E^{\mathrm{a}}$) from the policy's sensitivity to state ($Π(π)$). This decomposition reveals that locality can be induced by a smooth policy (small $Π(π)$) even when the environment is strongly action-coupled, exposing a fundamental locality-optimality tradeoff. We use this framework to derive a general spectral condition $ρ(E^{\mathrm{s}}+E^{\mathrm{a}}Π(π)) < 1$ for exponential decay, which is strictly tighter than prior norm-based conditions. Finally, we leverage this theory to analyze a provably-sound localized block-coordinate policy improvement framework with guarantees tied directly to this spectral radius.

A Unified Framework for Locality in Scalable MARL

TL;DR

Abstract

, which decouples the environment's sensitivity to state (

) and action (

) from the policy's sensitivity to state (

). This decomposition reveals that locality can be induced by a smooth policy (small

) even when the environment is strongly action-coupled, exposing a fundamental locality-optimality tradeoff. We use this framework to derive a general spectral condition

for exponential decay, which is strictly tighter than prior norm-based conditions. Finally, we leverage this theory to analyze a provably-sound localized block-coordinate policy improvement framework with guarantees tied directly to this spectral radius.

Paper Structure (27 sections, 13 theorems, 88 equations, 1 algorithm)

This paper contains 27 sections, 13 theorems, 88 equations, 1 algorithm.

Introduction
Setup and preliminaries
Policy–induced influence and locality
Extension: Discounted setting
The Algorithmic Link: Controlling Interdependence
A Provable Framework for Localized Policy Improvement
Phase 1: Localized evaluation via message passing
Phase 2: Localized policy improvement via block–coordinate KL updates
Related work
Proofs from Section \ref{['sec:influence']}
Proof of Proposition \ref{['prop:decomposition']}
Proof of Lemma \ref{['lem:one-step']}
Proof of Theorem \ref{['thm:poisson']}
Spatial decay as a corollary of sparsity
Average–reward localized evaluation and improvement (synchronous)
...and 12 more sections

Key Result

proposition 1

For any policy $\pi$ of product form and any synchronous dynamics $P(\cdot\mid s,a)$ on a finite state space, the interdependence matrix $C^\pi$ of the policy–induced kernel satisfies the entrywise bound $C^\pi \ \preceq\ E^{\mathrm{s}} + E^{\mathrm{a}}\, \Pi$. In particular, for each pair of indice

Theorems & Definitions (26)

proposition 1: Decomposition of policy–induced influence
lemma 1: Oscillation bound via $H^\pi$
theorem 1: Policy–uniform synchronous contraction and Poisson decay
definition 1: Softmax Policy
definition 2: Logit Lipschitz Constant
lemma 2: Softmax Temperature Controls $\Pi(\pi)$
theorem 2: Localized evaluation: certificate and bias
lemma 3: Softmax Lipschitz constant in total variation
proof
definition 3: Local logits and per-coordinate logit Lipschitz constants
...and 16 more

A Unified Framework for Locality in Scalable MARL

TL;DR

Abstract

A Unified Framework for Locality in Scalable MARL

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (26)