Monte Carlo Tree Search in the Presence of Transition Uncertainty
Farnaz Kohankhaki, Kiarash Aghakasiri, Hongming Zhang, Ting-Han Wei, Chao Gao, Martin Müller
TL;DR
This work addresses decision-making when the environment model is imperfect by introducing Uncertainty Adapted MCTS (UA-MCTS), which learns a transition-uncertainty function and uses it to steer search away from unreliable transitions. It extends MCTS with four uncertainty-aware components (selection, expansion, simulation, backpropagation) and proves a completeness property, while UA-UCB demonstrates tighter regret bounds than standard UCB in corrupted settings. Empirically, UA-MCTS substantially improves performance on deterministic MinAtar games, especially when online uncertainty is learned, and it often approaches true-model planning despite model errors. A key finding is that learning a compact uncertainty model can outperform attempting to learn full transition corrections, guiding robust planning in imperfect environments with practical impact for real-world planning under model misspecification.
Abstract
Monte Carlo Tree Search (MCTS) is an immensely popular search-based framework used for decision making. It is traditionally applied to domains where a perfect simulation model of the environment is available. We study and improve MCTS in the context where the environment model is given but imperfect. We show that the discrepancy between the model and the actual environment can lead to significant performance degradation with standard MCTS. We therefore develop Uncertainty Adapted MCTS (UA-MCTS), a more robust algorithm within the MCTS framework. We estimate the transition uncertainty in the given model, and direct the search towards more certain transitions in the state space. We modify all four MCTS phases to improve the search behavior by considering these estimates. We prove, in the corrupted bandit case, that adding uncertainty information to adapt UCB leads to tighter regret bound than standard UCB. Empirically, we evaluate UA-MCTS and its individual components on the deterministic domains from the MinAtar test suite. Our results demonstrate that UA-MCTS strongly improves MCTS in the presence of model transition errors.
