Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies
Alex DeWeese, Guannan Qu
TL;DR
The paper introduces Locally Interdependent Multi-Agent MDPs to model decentralized agents with dynamically changing dependencies driven by proximity, where agents within distance $\mathcal{R}$ influence rewards and those within $\mathcal{V}$ can communicate. It develops three closed-form, group-decentralized policies—Amalgam, Cutoff, and First Step Finite Horizon—and proves near-optimal guarantees with bounds of the form $|V^*(s)-V^{\text{policy}}(s)|\le C\gamma^{c+1}\tilde r$, where $c=\left\lfloor\frac{\mathcal{V}-\mathcal{R}}{2}\right\rfloor$ and $\tilde r$ captures reward magnitude. A corresponding lower bound shows these results are tight up to constants, and a Telescoping Lemma establishes how to convert naive policy analyses into the final guarantees. The framework further offers scalable extensions (e.g., eliminating, splitting, or approximating large groups) and demonstrates long-horizon behavior via simulations in cooperative navigation, obstacle avoidance, and formation control. This work provides a theoretically grounded, scalable approach to decentralized RL in settings with dynamic dependencies among agents.
Abstract
Many multi-agent systems in practice are decentralized and have dynamically varying dependencies. There has been a lack of attempts in the literature to analyze these systems theoretically. In this paper, we propose and theoretically analyze a decentralized model with dynamically varying dependencies called the Locally Interdependent Multi-Agent MDP. This model can represent problems in many disparate domains such as cooperative navigation, obstacle avoidance, and formation control. Despite the intractability that general partially observable multi-agent systems suffer from, we propose three closed-form policies that are theoretically near-optimal in this setting and can be scalable to compute and store. Consequentially, we reveal a fundamental property of Locally Interdependent Multi-Agent MDP's that the partially observable decentralized solution is exponentially close to the fully observable solution with respect to the visibility radius. We then discuss extensions of our closed-form policies to further improve tractability. We conclude by providing simulations to investigate some long horizon behaviors of our closed-form policies.
