Table of Contents
Fetching ...

A Semi-Decentralized Approach to Multiagent Control

Mahdi Al-Husseini, Mykel J. Kochenderfer, Kyle H. Wray

TL;DR

This paper extends semi-decentralization to the partially observable Markov decision process (POMDP) and presents recursive small-step semi-decentralized A* (RS-SDA*), an exact algorithm for generating optimal SDec-POMDP policies.

Abstract

We introduce an expressive framework and algorithms for the semi-decentralized control of cooperative agents in environments with communication uncertainty. Whereas semi-Markov control admits a distribution over time for agent actions, semi-Markov communication, or what we refer to as semi-decentralization, gives a distribution over time for what actions and observations agents can store in their histories. We extend semi-decentralization to the partially observable Markov decision process (POMDP). The resulting SDec-POMDP unifies decentralized and multiagent POMDPs and several existing explicit communication mechanisms. We present recursive small-step semi-decentralized A* (RS-SDA*), an exact algorithm for generating optimal SDec-POMDP policies. RS-SDA* is evaluated on semi-decentralized versions of several standard benchmarks and a maritime medical evacuation scenario. This paper provides a well-defined theoretical foundation for exploring many classes of multiagent communication problems through the lens of semi-decentralization.

A Semi-Decentralized Approach to Multiagent Control

TL;DR

This paper extends semi-decentralization to the partially observable Markov decision process (POMDP) and presents recursive small-step semi-decentralized A* (RS-SDA*), an exact algorithm for generating optimal SDec-POMDP policies.

Abstract

We introduce an expressive framework and algorithms for the semi-decentralized control of cooperative agents in environments with communication uncertainty. Whereas semi-Markov control admits a distribution over time for agent actions, semi-Markov communication, or what we refer to as semi-decentralization, gives a distribution over time for what actions and observations agents can store in their histories. We extend semi-decentralization to the partially observable Markov decision process (POMDP). The resulting SDec-POMDP unifies decentralized and multiagent POMDPs and several existing explicit communication mechanisms. We present recursive small-step semi-decentralized A* (RS-SDA*), an exact algorithm for generating optimal SDec-POMDP policies. RS-SDA* is evaluated on semi-decentralized versions of several standard benchmarks and a maritime medical evacuation scenario. This paper provides a well-defined theoretical foundation for exploring many classes of multiagent communication problems through the lens of semi-decentralization.
Paper Structure (25 sections, 18 theorems, 21 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 18 theorems, 21 equations, 9 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

SDec-POMDP and MPOMDP models are equivalent.

Figures (9)

  • Figure 1: A semi-decentralized multiagent evacuation scenario with probabilistic restrictions on communication. Aircraft and watercraft must coordinate under communication constraints to move patients from aid stations to hospitals.
  • Figure 2: The SDec-POMDP dynamic decision network, with the policy infrastructure on the left and model on the right. The green backdrop contains the blackboard with memory $M_c$ generated from the histories of communicating agents. The gray backdrop with plate notation includes the individual agent memories $M_i$. $Z$ selector nodes are selectively toggled by $\bar{\tau}$ to facilitate memory propagation $\eta$, represented by dashed lines. Policy $\psi$ edges are represented by dotted lines. The SDec-POMDP framework is flexible and can be easily modified to capture the structural and informational characteristics of different problem domains.
  • Figure 3: Illustrating RS-SDA* applied to SDec-Tiger using mixed component policies through stage $\sigma = 2$.
  • Figure 4: MaritimeMEDEVAC environment representation and centralized/decentralized/semi-decentralized optimal policy values for horizons one through eight.
  • Figure 5: Illustration of four of nine possible joint actions for SDec-Tiger. Agents communicate their observation histories with some probability when they listen to the same door (in green).
  • ...and 4 more figures

Theorems & Definitions (18)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Proposition 1
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • ...and 8 more