Table of Contents
Fetching ...

Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies

Alex DeWeese, Guannan Qu

TL;DR

The paper introduces Locally Interdependent Multi-Agent MDPs to model decentralized agents with dynamically changing dependencies driven by proximity, where agents within distance $\mathcal{R}$ influence rewards and those within $\mathcal{V}$ can communicate. It develops three closed-form, group-decentralized policies—Amalgam, Cutoff, and First Step Finite Horizon—and proves near-optimal guarantees with bounds of the form $|V^*(s)-V^{\text{policy}}(s)|\le C\gamma^{c+1}\tilde r$, where $c=\left\lfloor\frac{\mathcal{V}-\mathcal{R}}{2}\right\rfloor$ and $\tilde r$ captures reward magnitude. A corresponding lower bound shows these results are tight up to constants, and a Telescoping Lemma establishes how to convert naive policy analyses into the final guarantees. The framework further offers scalable extensions (e.g., eliminating, splitting, or approximating large groups) and demonstrates long-horizon behavior via simulations in cooperative navigation, obstacle avoidance, and formation control. This work provides a theoretically grounded, scalable approach to decentralized RL in settings with dynamic dependencies among agents.

Abstract

Many multi-agent systems in practice are decentralized and have dynamically varying dependencies. There has been a lack of attempts in the literature to analyze these systems theoretically. In this paper, we propose and theoretically analyze a decentralized model with dynamically varying dependencies called the Locally Interdependent Multi-Agent MDP. This model can represent problems in many disparate domains such as cooperative navigation, obstacle avoidance, and formation control. Despite the intractability that general partially observable multi-agent systems suffer from, we propose three closed-form policies that are theoretically near-optimal in this setting and can be scalable to compute and store. Consequentially, we reveal a fundamental property of Locally Interdependent Multi-Agent MDP's that the partially observable decentralized solution is exponentially close to the fully observable solution with respect to the visibility radius. We then discuss extensions of our closed-form policies to further improve tractability. We conclude by providing simulations to investigate some long horizon behaviors of our closed-form policies.

Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies

TL;DR

The paper introduces Locally Interdependent Multi-Agent MDPs to model decentralized agents with dynamically changing dependencies driven by proximity, where agents within distance influence rewards and those within can communicate. It develops three closed-form, group-decentralized policies—Amalgam, Cutoff, and First Step Finite Horizon—and proves near-optimal guarantees with bounds of the form , where and captures reward magnitude. A corresponding lower bound shows these results are tight up to constants, and a Telescoping Lemma establishes how to convert naive policy analyses into the final guarantees. The framework further offers scalable extensions (e.g., eliminating, splitting, or approximating large groups) and demonstrates long-horizon behavior via simulations in cooperative navigation, obstacle avoidance, and formation control. This work provides a theoretically grounded, scalable approach to decentralized RL in settings with dynamic dependencies among agents.

Abstract

Many multi-agent systems in practice are decentralized and have dynamically varying dependencies. There has been a lack of attempts in the literature to analyze these systems theoretically. In this paper, we propose and theoretically analyze a decentralized model with dynamically varying dependencies called the Locally Interdependent Multi-Agent MDP. This model can represent problems in many disparate domains such as cooperative navigation, obstacle avoidance, and formation control. Despite the intractability that general partially observable multi-agent systems suffer from, we propose three closed-form policies that are theoretically near-optimal in this setting and can be scalable to compute and store. Consequentially, we reveal a fundamental property of Locally Interdependent Multi-Agent MDP's that the partially observable decentralized solution is exponentially close to the fully observable solution with respect to the visibility radius. We then discuss extensions of our closed-form policies to further improve tractability. We conclude by providing simulations to investigate some long horizon behaviors of our closed-form policies.
Paper Structure (52 sections, 18 theorems, 34 equations, 10 figures)

This paper contains 52 sections, 18 theorems, 34 equations, 10 figures.

Key Result

Theorem 3.1

$\lvert V^*(s) - V^{\lambda} (s) \rvert \leq \frac{2}{(1 - \gamma)^2}\gamma^{c + 1} \tilde{r}$.

Figures (10)

  • Figure 1: 3 agents moving in the space of $\mathcal{X} = \mathbb{R}^2$ with standard Euclidean distance. The bottom two agents potentially have an interdependent reward since they are within distance $\mathcal{R}$ of one another. Furthermore, every agent is within distance $\mathcal{V}$ of another agent so all agents can communicate with each other. Notably, the top and bottom agents may communicate even though they are not within distance $\mathcal{V}$ of each other.
  • Figure 2: Bullseye Problem: In red is the optimal policy with a discounted sum of rewards of $8.85$. The top three in blue are Amalgam Policy rollouts with $\mathcal{V}=25$,$\mathcal{\mathcal{}} V=35$, $\mathcal{V}=45$ top to bottom. They have a total discounted reward of $6.74$, $8.26$, and $8.85$ respectively. Therefore, $\lvert V^*(s) - V^{\lambda}(s)\rvert$ is $2.11$, $0.59$, $0$ respectively. In green is the Cutoff Policy with $\mathcal{V} = 25$. It obtains a discounted reward of $-5.38$. All reported discounted sum of rewards are rounded to the second decimal place.
  • Figure 3: Aisle Walk Problem: In red is the optimal policy with a discounted reward of $496.84$, In blue is the Amalgam Policy with a discounted reward of $234.40$, and in green is the Cutoff Policy with a discounted reward of $400$. All reported discounted sum of rewards are rounded to the second decimal place.
  • Figure 4: Highway Problem with Amalgam and Optimal Policy: In red is the optimal policy with a discounted reward of 73.5 and in blue is the Amalgam Policy with 70.93 rounded to the second decimal place.
  • Figure 5: Highway Problem with Cutoff Policy: In green is the Cutoff Policy with an accumulated discounted reward of $0$.
  • ...and 5 more figures

Theorems & Definitions (35)

  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Corollary 3.5
  • Lemma 4.1
  • proof
  • Lemma 4.2
  • Lemma 4.3
  • Lemma 4.4
  • ...and 25 more