Table of Contents
Fetching ...

A Unified Framework for Locality in Scalable MARL

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

TL;DR

It is established that locality can also be a policy-dependent phenomenon, and a novel decomposition of the policy-induced interdependence matrix is made, revealing that locality can be induced by a smooth policy even when the environment is strongly action-coupled, exposing a fundamental locality-optimality tradeoff.

Abstract

Scalable Multi-Agent Reinforcement Learning (MARL) is fundamentally challenged by the curse of dimensionality. A common solution is to exploit locality, which hinges on an Exponential Decay Property (EDP) of the value function. However, existing conditions that guarantee the EDP are often conservative, as they are based on worst-case, environment-only bounds (e.g., supremums over actions) and fail to capture the regularizing effect of the policy itself. In this work, we establish that locality can also be a \emph{policy-dependent} phenomenon. Our central contribution is a novel decomposition of the policy-induced interdependence matrix, $H^π$, which decouples the environment's sensitivity to state ($E^{\mathrm{s}}$) and action ($E^{\mathrm{a}}$) from the policy's sensitivity to state ($Π(π)$). This decomposition reveals that locality can be induced by a smooth policy (small $Π(π)$) even when the environment is strongly action-coupled, exposing a fundamental locality-optimality tradeoff. We use this framework to derive a general spectral condition $ρ(E^{\mathrm{s}}+E^{\mathrm{a}}Π(π)) < 1$ for exponential decay, which is strictly tighter than prior norm-based conditions. Finally, we leverage this theory to analyze a provably-sound localized block-coordinate policy improvement framework with guarantees tied directly to this spectral radius.

A Unified Framework for Locality in Scalable MARL

TL;DR

It is established that locality can also be a policy-dependent phenomenon, and a novel decomposition of the policy-induced interdependence matrix is made, revealing that locality can be induced by a smooth policy even when the environment is strongly action-coupled, exposing a fundamental locality-optimality tradeoff.

Abstract

Scalable Multi-Agent Reinforcement Learning (MARL) is fundamentally challenged by the curse of dimensionality. A common solution is to exploit locality, which hinges on an Exponential Decay Property (EDP) of the value function. However, existing conditions that guarantee the EDP are often conservative, as they are based on worst-case, environment-only bounds (e.g., supremums over actions) and fail to capture the regularizing effect of the policy itself. In this work, we establish that locality can also be a \emph{policy-dependent} phenomenon. Our central contribution is a novel decomposition of the policy-induced interdependence matrix, , which decouples the environment's sensitivity to state () and action () from the policy's sensitivity to state (). This decomposition reveals that locality can be induced by a smooth policy (small ) even when the environment is strongly action-coupled, exposing a fundamental locality-optimality tradeoff. We use this framework to derive a general spectral condition for exponential decay, which is strictly tighter than prior norm-based conditions. Finally, we leverage this theory to analyze a provably-sound localized block-coordinate policy improvement framework with guarantees tied directly to this spectral radius.
Paper Structure (27 sections, 13 theorems, 88 equations, 1 algorithm)

This paper contains 27 sections, 13 theorems, 88 equations, 1 algorithm.

Key Result

proposition 1

For any policy $\pi$ of product form and any synchronous dynamics $P(\cdot\mid s,a)$ on a finite state space, the interdependence matrix $C^\pi$ of the policy–induced kernel satisfies the entrywise bound $C^\pi \ \preceq\ E^{\mathrm{s}} + E^{\mathrm{a}}\, \Pi$. In particular, for each pair of indice

Theorems & Definitions (26)

  • proposition 1: Decomposition of policy–induced influence
  • lemma 1: Oscillation bound via $H^\pi$
  • theorem 1: Policy–uniform synchronous contraction and Poisson decay
  • definition 1: Softmax Policy
  • definition 2: Logit Lipschitz Constant
  • lemma 2: Softmax Temperature Controls $\Pi(\pi)$
  • theorem 2: Localized evaluation: certificate and bias
  • lemma 3: Softmax Lipschitz constant in total variation
  • proof
  • definition 3: Local logits and per-coordinate logit Lipschitz constants
  • ...and 16 more