Table of Contents
Fetching ...

United We Stand: Decentralized Multi-Agent Planning With Attrition

Nhat Nguyen, Duong Nguyen, Gianluca Rizzo, Hung Nguyen

TL;DR

This work proposes Attritable MCTS (A-MCTS), a decentralized MCTS algorithm capable of timely and efficient adaptation to changes in the set of active agents, based on the use of a global reward function for the estimation of each agent's local contribution, and regret matching for coordination.

Abstract

Decentralized planning is a key element of cooperative multi-agent systems for information gathering tasks. However, despite the high frequency of agent failures in realistic large deployment scenarios, current approaches perform poorly in the presence of failures, by not converging at all, and/or by making very inefficient use of resources (e.g. energy). In this work, we propose Attritable MCTS (A-MCTS), a decentralized MCTS algorithm capable of timely and efficient adaptation to changes in the set of active agents. It is based on the use of a global reward function for the estimation of each agent's local contribution, and regret matching for coordination. We evaluate its effectiveness in realistic data-harvesting problems under different scenarios. We show both theoretically and experimentally that A-MCTS enables efficient adaptation even under high failure rates. Results suggest that, in the presence of frequent failures, our solution improves substantially over the best existing approaches in terms of global utility and scalability.

United We Stand: Decentralized Multi-Agent Planning With Attrition

TL;DR

This work proposes Attritable MCTS (A-MCTS), a decentralized MCTS algorithm capable of timely and efficient adaptation to changes in the set of active agents, based on the use of a global reward function for the estimation of each agent's local contribution, and regret matching for coordination.

Abstract

Decentralized planning is a key element of cooperative multi-agent systems for information gathering tasks. However, despite the high frequency of agent failures in realistic large deployment scenarios, current approaches perform poorly in the presence of failures, by not converging at all, and/or by making very inefficient use of resources (e.g. energy). In this work, we propose Attritable MCTS (A-MCTS), a decentralized MCTS algorithm capable of timely and efficient adaptation to changes in the set of active agents. It is based on the use of a global reward function for the estimation of each agent's local contribution, and regret matching for coordination. We evaluate its effectiveness in realistic data-harvesting problems under different scenarios. We show both theoretically and experimentally that A-MCTS enables efficient adaptation even under high failure rates. Results suggest that, in the presence of frequent failures, our solution improves substantially over the best existing approaches in terms of global utility and scalability.
Paper Structure (17 sections, 4 theorems, 34 equations, 5 figures, 2 algorithms)

This paper contains 17 sections, 4 theorems, 34 equations, 5 figures, 2 algorithms.

Key Result

Proposition 1

If the global objective function $U_g$ is submodular, then $F_n^{(t+1)}(x^*_n) \ge F_n^{(t)}(x^*_n)$ by the diminishing return property due to submodularity, where $F_n(x_n)$ is defined in (eq:marginal).

Figures (5)

  • Figure 1: Overview of the A-MCTS algorithm. Agents incrementally grow the search using the best response policy $x^{BR}$ and communicate their best actions $\hat{\mathcal{X}}$. Regret Matching is then used to compute distributively a joint policy for the cooperative game. These solutions are synchronized and the most payoff-dominant is chosen as the best response policy $x^{BR}$.
  • Figure 2: Impact of different parameters on the algorithms' performance at the mission end. Failures intensity (the fraction of agents that fail) (a); Planning time (b); Number of exchanged components (c); Actions budget (d), Number of agents (e); Number of rewards (f), and Communication failure probability (g). Results are with $95\%$ confidence interval.
  • Figure 3: Impact of allowed inter-agent communication loss on the performance of A-MCTS at the end of the mission.
  • Figure 4: Diamonds collection game (a), number of diamonds collected (b), and D-UCB score for each action of Agent 1 (c).
  • Figure 5: Evolution over the mission of the Instantaneous Reward Coverage (IRC) in the Forced Failure setting for different times of attrition: no attrition (a), attrition after 2 actions (b), attrition after 4 actions (c), and attrition after 6 actions (d). Results are with $95\%$ confidence.

Theorems & Definitions (7)

  • Definition 1: Submodular set function
  • Proposition 1
  • Definition 2: Pure-Strategy Nash Equilibrium
  • Theorem 2
  • Lemma 3
  • Lemma 4
  • proof