Boltzmann-based Exploration for Robust Decentralized Multi-Agent Planning

Nhat Nguyen; Duong Nguyen; Gianluca Rizzo; Hung Nguyen

Boltzmann-based Exploration for Robust Decentralized Multi-Agent Planning

Nhat Nguyen, Duong Nguyen, Gianluca Rizzo, Hung Nguyen

TL;DR

Coordinated Boltzmann MCTS is introduced, which replaces deterministic UCT with a stochastic Boltzmann policy and a decaying entropy bonus for sustained yet focused exploration in multi-agent MCTS.

Abstract

Decentralized Monte Carlo Tree Search (Dec-MCTS) is widely used for cooperative multi-agent planning but struggles in sparse or skewed reward environments. We introduce Coordinated Boltzmann MCTS (CB-MCTS), which replaces deterministic UCT with a stochastic Boltzmann policy and a decaying entropy bonus for sustained yet focused exploration. While Boltzmann exploration has been studied in single-agent MCTS, applying it in multi-agent systems poses unique challenges. CB-MCTS is the first to address this. We analyze CB-MCTS in the simple-regret setting and show in simulations that it outperforms Dec-MCTS in deceptive scenarios and remains competitive on standard benchmarks, providing a robust solution for multi-agent planning.

Boltzmann-based Exploration for Robust Decentralized Multi-Agent Planning

TL;DR

Coordinated Boltzmann MCTS is introduced, which replaces deterministic UCT with a stochastic Boltzmann policy and a decaying entropy bonus for sustained yet focused exploration in multi-agent MCTS.

Abstract

Paper Structure (23 sections, 4 theorems, 40 equations, 25 figures, 2 tables, 2 algorithms)

This paper contains 23 sections, 4 theorems, 40 equations, 25 figures, 2 tables, 2 algorithms.

Introduction
Problem Statement
Coordinated Boltzmann MCTS
Distributed CB-MCTS with Discounted Backup
Boltzmann Selection Policy
Simple Regret Analysis
Empirical Evaluation
Frozen Lake Problem
Oil Rigs Inspection Problem
Conclusion
Acknowledgments
Appendix
Technical Results
Notations and Assumptions
Preliminary Results
...and 8 more sections

Key Result

Lemma 1

For a fixed $\gamma$, there exists a value $D$ such that Dec-MCTS with D-UCT fails to identify the optimal action sequence in the D-Chain problem.

Figures (25)

Figure 1: Simple regret of CB-MCTS and Dec-MCTS in the multi-agent D-chain problem with $D = 10$ and 2 agents.
Figure 2: Performance comparison on the Frozen Lake benchmark.
Figure 3: Performance comparison in the Oil Rigs Inspection problem.
Figure 4: An illustration of the D-chain problem on a m-ary tree for multi-agent. Blue nodes are decision states, which can have at most $m$ children; and gray nodes are terminal states with corresponding rewards.
Figure 5: Results of Dec-MCTS on the D-chain problem with $D=10$ and 2 agents for varying exploration bias $\varepsilon$ and discount factor $\gamma$.
...and 20 more figures

Theorems & Definitions (6)

Definition 1
Lemma 1
Theorem 1
Theorem 2
Lemma 2
proof

Boltzmann-based Exploration for Robust Decentralized Multi-Agent Planning

TL;DR

Abstract

Boltzmann-based Exploration for Robust Decentralized Multi-Agent Planning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (25)

Theorems & Definitions (6)