Tree Ensembles for Contextual Bandits

Hannes Nilsson; Rikard Johansson; Niklas Åkerblom; Morteza Haghir Chehreghani

Tree Ensembles for Contextual Bandits

Hannes Nilsson, Rikard Johansson, Niklas Åkerblom, Morteza Haghir Chehreghani

TL;DR

This work introduces a practical framework that integrates tree ensembles with contextual and combinatorial bandits by adapting UCB and Thompson Sampling strategies. By modeling uncertainty at the leaf level of trees and aggregating across an ensemble, the authors develop TEUCB and TETS, applicable to XGBoost and Random Forests, and extendable to combinatorial settings. Empirical results on UCI benchmarks and a Luxembourg road-network navigation task demonstrate strong regret performance and favorable computational efficiency compared to neural baselines, highlighting the generalization ability of a single tree ensemble across arms. The study emphasizes practical applicability and paves the way for theoretical regret analyses and broader real-world deployments of tree-ensemble bandits.

Abstract

We propose a new framework for contextual multi-armed bandits based on tree ensembles. Our framework adapts two widely used bandit methods, Upper Confidence Bound and Thompson Sampling, for both standard and combinatorial settings. As part of this framework, we propose a novel method of estimating the uncertainty in tree ensemble predictions. We further demonstrate the effectiveness of our framework via several experimental studies, employing XGBoost and random forests, two popular tree ensemble methods. Compared to state-of-the-art methods based on decision trees and neural networks, our methods exhibit superior performance in terms of both regret minimization and computational runtime, when applied to benchmark datasets and the real-world application of navigation over road networks.

Tree Ensembles for Contextual Bandits

TL;DR

Abstract

Paper Structure (33 sections, 11 equations, 7 figures, 4 tables, 3 algorithms)

This paper contains 33 sections, 11 equations, 7 figures, 4 tables, 3 algorithms.

Introduction
Related Work
Background
Multi-Armed Bandit Problem
Contextual Bandit Problem
Combinatorial Bandits
Proposed Algorithms
Tree-Based Weak Learners
Uncertainty Modeling
Tree Ensemble Upper Confidence Bound
Tree Ensemble Thompson Sampling
Extension to Combinatorial Bandits
Adaptation to XGBoost
Adaptation to Random Forest
Experiments
...and 18 more sections

Figures (7)

Figure 1: Comparison of contextual MAB algorithms on UCI datasets. Figures \ref{['Fig1:adult']}, \ref{['Fig1:magic']}, \ref{['Fig1:mushroom']}, and \ref{['Fig1:shuttle']} share the same color scheme for consistency, but the legend is only presented in \ref{['Fig1:adult']} for improved visibility.
Figure 2: Experimental results on real-world road network navigation in Luxembourg.
Figure 3: The road network of Luxembourg shows the trajectories of different agents for the experiment on problem instance 1, where the corresponding cumulative regret is presented in \ref{['fig:cum_reg_1175']}. The plots show all the paths selected by the agents during a full run of the experiment, where a higher level of opacity indicates that a road segment was more frequently part of a traveled path.
Figure 4: Comparison of TEUCB, TETS, and TreeBootstrap on the mushroom and the adult datasets with different levels of delays. Cumulative regret for a single experiment is calculated over 10,000 time steps and repeated 10 times. The results display the average cumulative regret plus/minus the standard deviation for each respective agent. Hyperparameters are selected as in \ref{['sec:contextual_experiment']}. All agents are evaluated on the same levels of reward delays, but each agent is shifted slightly horizontally for visualization purposes.
Figure 5: Comparison of TEUCB and TETS on the mushroom and the adult datasets with different tree depths. Cumulative regret for a single experiment is calculated over 10,000 time steps and repeated 10 times. The results display the average cumulative regret plus/minus the standard deviation for each respective agent. Hyperparameters other than tree depth are selected as in \ref{['sec:contextual_experiment']}. All agents are evaluated with the same tree depths, but each agent is shifted slightly horizontally for visualization purposes.
...and 2 more figures

Tree Ensembles for Contextual Bandits

TL;DR

Abstract

Tree Ensembles for Contextual Bandits

Authors

TL;DR

Abstract

Table of Contents

Figures (7)