Gaussian Process Aggregation for Root-Parallel Monte Carlo Tree Search with Continuous Actions
Junlin Xiao, Victor-Alexandru Darvariu, Bruno Lacerda, Nick Hawes
TL;DR
The paper tackles the challenge of aggregating results in root-parallel MCTS for continuous action spaces. It introduces GPR2P, a Gaussian Process Regression-based aggregation that interpolates returns over the entire action space and uses the predictive mean to select actions. A reliability threshold and an RBF kernel underpin the GP fitting to retain actions and generalize to unseen ones. Across six diverse environments, including both deterministic and stochastic transitions, GPR2P consistently outperforms prior aggregation strategies, particularly at low trial budgets, with a modest inference-time overhead. The work demonstrates the practical impact of principled interpolation in online planning for continuous domains and suggests avenues for integrating GP guidance with per-thread decision making.
Abstract
Monte Carlo Tree Search is a cornerstone algorithm for online planning, and its root-parallel variant is widely used when wall clock time is limited but best performance is desired. In environments with continuous action spaces, how to best aggregate statistics from different threads is an important yet underexplored question. In this work, we introduce a method that uses Gaussian Process Regression to obtain value estimates for promising actions that were not trialed in the environment. We perform a systematic evaluation across 6 different domains, demonstrating that our approach outperforms existing aggregation strategies while requiring a modest increase in inference time.
