Tree Ensembles for Contextual Bandits
Hannes Nilsson, Rikard Johansson, Niklas Åkerblom, Morteza Haghir Chehreghani
TL;DR
This work introduces a practical framework that integrates tree ensembles with contextual and combinatorial bandits by adapting UCB and Thompson Sampling strategies. By modeling uncertainty at the leaf level of trees and aggregating across an ensemble, the authors develop TEUCB and TETS, applicable to XGBoost and Random Forests, and extendable to combinatorial settings. Empirical results on UCI benchmarks and a Luxembourg road-network navigation task demonstrate strong regret performance and favorable computational efficiency compared to neural baselines, highlighting the generalization ability of a single tree ensemble across arms. The study emphasizes practical applicability and paves the way for theoretical regret analyses and broader real-world deployments of tree-ensemble bandits.
Abstract
We propose a new framework for contextual multi-armed bandits based on tree ensembles. Our framework adapts two widely used bandit methods, Upper Confidence Bound and Thompson Sampling, for both standard and combinatorial settings. As part of this framework, we propose a novel method of estimating the uncertainty in tree ensemble predictions. We further demonstrate the effectiveness of our framework via several experimental studies, employing XGBoost and random forests, two popular tree ensemble methods. Compared to state-of-the-art methods based on decision trees and neural networks, our methods exhibit superior performance in terms of both regret minimization and computational runtime, when applied to benchmark datasets and the real-world application of navigation over road networks.
