Table of Contents
Fetching ...

Boltzmann-based Exploration for Robust Decentralized Multi-Agent Planning

Nhat Nguyen, Duong Nguyen, Gianluca Rizzo, Hung Nguyen

TL;DR

Coordinated Boltzmann MCTS is introduced, which replaces deterministic UCT with a stochastic Boltzmann policy and a decaying entropy bonus for sustained yet focused exploration in multi-agent MCTS.

Abstract

Decentralized Monte Carlo Tree Search (Dec-MCTS) is widely used for cooperative multi-agent planning but struggles in sparse or skewed reward environments. We introduce Coordinated Boltzmann MCTS (CB-MCTS), which replaces deterministic UCT with a stochastic Boltzmann policy and a decaying entropy bonus for sustained yet focused exploration. While Boltzmann exploration has been studied in single-agent MCTS, applying it in multi-agent systems poses unique challenges. CB-MCTS is the first to address this. We analyze CB-MCTS in the simple-regret setting and show in simulations that it outperforms Dec-MCTS in deceptive scenarios and remains competitive on standard benchmarks, providing a robust solution for multi-agent planning.

Boltzmann-based Exploration for Robust Decentralized Multi-Agent Planning

TL;DR

Coordinated Boltzmann MCTS is introduced, which replaces deterministic UCT with a stochastic Boltzmann policy and a decaying entropy bonus for sustained yet focused exploration in multi-agent MCTS.

Abstract

Decentralized Monte Carlo Tree Search (Dec-MCTS) is widely used for cooperative multi-agent planning but struggles in sparse or skewed reward environments. We introduce Coordinated Boltzmann MCTS (CB-MCTS), which replaces deterministic UCT with a stochastic Boltzmann policy and a decaying entropy bonus for sustained yet focused exploration. While Boltzmann exploration has been studied in single-agent MCTS, applying it in multi-agent systems poses unique challenges. CB-MCTS is the first to address this. We analyze CB-MCTS in the simple-regret setting and show in simulations that it outperforms Dec-MCTS in deceptive scenarios and remains competitive on standard benchmarks, providing a robust solution for multi-agent planning.
Paper Structure (23 sections, 4 theorems, 40 equations, 25 figures, 2 tables, 2 algorithms)

This paper contains 23 sections, 4 theorems, 40 equations, 25 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

For a fixed $\gamma$, there exists a value $D$ such that Dec-MCTS with D-UCT fails to identify the optimal action sequence in the D-Chain problem.

Figures (25)

  • Figure 1: Simple regret of CB-MCTS and Dec-MCTS in the multi-agent D-chain problem with $D = 10$ and 2 agents.
  • Figure 2: Performance comparison on the Frozen Lake benchmark.
  • Figure 3: Performance comparison in the Oil Rigs Inspection problem.
  • Figure 4: An illustration of the D-chain problem on a m-ary tree for multi-agent. Blue nodes are decision states, which can have at most $m$ children; and gray nodes are terminal states with corresponding rewards.
  • Figure 5: Results of Dec-MCTS on the D-chain problem with $D=10$ and 2 agents for varying exploration bias $\varepsilon$ and discount factor $\gamma$.
  • ...and 20 more figures

Theorems & Definitions (6)

  • Definition 1
  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Lemma 2
  • proof