Table of Contents
Fetching ...

BFTS: Thompson Sampling with Bayesian Additive Regression Trees

Ruizhe Deng, Bibhas Chakraborty, Ran Chen, Yan Shuo Tan

TL;DR

BFTS introduces Bayesian Forest Thompson Sampling by integrating Bayesian Additive Regression Trees (BART) into the Thompson Sampling framework for contextual bandits. It derives a Bayesian regret guarantee $\mathbb{E}[\mathrm{Regret}_T] \le K\sigma\sqrt{2Tm\Psi_T}$ and a complementary minimax rate for a Feel-Good TS variant, and demonstrates strong empirical performance on OpenML benchmarks and a Drink Less mHealth trial with calibrated uncertainty and interpretability. The approach uses independent-arm BART priors, a separate-arm encoding strategy, and batched MCMC inference with a logarithmic refresh schedule to balance accuracy and computation. Practically, BFTS yields improved engagement and policy-value in real-world interventions while offering principled uncertainty quantification and post-hoc feature-importance insights. This work highlights a viable pathway to robust online personalization in non-linear, heterogeneous health-context data where online tuning is challenging.

Abstract

Contextual bandits are a core technology for personalized mobile health interventions, where decision-making requires adapting to complex, non-linear user behaviors. While Thompson Sampling (TS) is a preferred strategy for these problems, its performance hinges on the quality of the underlying reward model. Standard linear models suffer from high bias, while neural network approaches are often brittle and difficult to tune in online settings. Conversely, tree ensembles dominate tabular data prediction but typically rely on heuristic uncertainty quantification, lacking a principled probabilistic basis for TS. We propose Bayesian Forest Thompson Sampling (BFTS), the first contextual bandit algorithm to integrate Bayesian Additive Regression Trees (BART), a fully probabilistic sum-of-trees model, directly into the exploration loop. We prove that BFTS is theoretically sound, deriving an information-theoretic Bayesian regret bound of $\tilde{O}(\sqrt{T})$. As a complementary result, we establish frequentist minimax optimality for a "feel-good" variant, confirming the structural suitability of BART priors for non-parametric bandits. Empirically, BFTS achieves state-of-the-art regret on tabular benchmarks with near-nominal uncertainty calibration. Furthermore, in an offline policy evaluation on the Drink Less micro-randomized trial, BFTS improves engagement rates by over 30% compared to the deployed policy, demonstrating its practical effectiveness for behavioral interventions.

BFTS: Thompson Sampling with Bayesian Additive Regression Trees

TL;DR

BFTS introduces Bayesian Forest Thompson Sampling by integrating Bayesian Additive Regression Trees (BART) into the Thompson Sampling framework for contextual bandits. It derives a Bayesian regret guarantee and a complementary minimax rate for a Feel-Good TS variant, and demonstrates strong empirical performance on OpenML benchmarks and a Drink Less mHealth trial with calibrated uncertainty and interpretability. The approach uses independent-arm BART priors, a separate-arm encoding strategy, and batched MCMC inference with a logarithmic refresh schedule to balance accuracy and computation. Practically, BFTS yields improved engagement and policy-value in real-world interventions while offering principled uncertainty quantification and post-hoc feature-importance insights. This work highlights a viable pathway to robust online personalization in non-linear, heterogeneous health-context data where online tuning is challenging.

Abstract

Contextual bandits are a core technology for personalized mobile health interventions, where decision-making requires adapting to complex, non-linear user behaviors. While Thompson Sampling (TS) is a preferred strategy for these problems, its performance hinges on the quality of the underlying reward model. Standard linear models suffer from high bias, while neural network approaches are often brittle and difficult to tune in online settings. Conversely, tree ensembles dominate tabular data prediction but typically rely on heuristic uncertainty quantification, lacking a principled probabilistic basis for TS. We propose Bayesian Forest Thompson Sampling (BFTS), the first contextual bandit algorithm to integrate Bayesian Additive Regression Trees (BART), a fully probabilistic sum-of-trees model, directly into the exploration loop. We prove that BFTS is theoretically sound, deriving an information-theoretic Bayesian regret bound of . As a complementary result, we establish frequentist minimax optimality for a "feel-good" variant, confirming the structural suitability of BART priors for non-parametric bandits. Empirically, BFTS achieves state-of-the-art regret on tabular benchmarks with near-nominal uncertainty calibration. Furthermore, in an offline policy evaluation on the Drink Less micro-randomized trial, BFTS improves engagement rates by over 30% compared to the deployed policy, demonstrating its practical effectiveness for behavioral interventions.
Paper Structure (66 sections, 10 theorems, 127 equations, 13 figures, 5 tables, 2 algorithms)

This paper contains 66 sections, 10 theorems, 127 equations, 13 figures, 5 tables, 2 algorithms.

Key Result

Theorem 1

Let $\Pi = \Pi_{BART}^{\otimes K}$ be the product BART prior on $f_0$ with $m$ trees per arm. Under assumptions (A1)--(A3), the expected Bayesian regret of Ideal BFTS satisfies: where $\Psi_T = C_{\mathrm{str}}\log(p N_{\text{max}}) + C_{\mathrm{leaf}} \log\bigl(1 + \frac{T}{4K\kappa^2\sigma^2}\bigr)$ is the information complexity term. Here, $C_{\mathrm{str}}$ and $C_{\mathrm{leaf}}$ depend only

Figures (13)

  • Figure 1: Cumulative regret trajectories on synthetic benchmarks. We report mean $\pm$ SD over $R=12$ replications up to $T=10{,}000$. Full synthetic benchmark regret curves are provided in Appendix (\ref{['fig:app:syn_regret_curves_full']}).
  • Figure 2: Coverage vs. interval length on synthetic benchmarks. We plot CI frontiers for representative nonlinear (Friedman) and linear scenarios. Each marker corresponds to an evaluation round $t\in\{200,500,1000,2000,5000,10000\}$ and lines connect rounds in temporal order (arrows pointing to later rounds). The red dashed line indicates the nominal $0.95$ coverage target (below: overconfident; above: conservative). Full uncertainty diagnostics are shown in Appendix (\ref{['fig:app:syn_uncertainty']}).
  • Figure 3: Estimated policy value on the Drink Less mHealth trial data. Policy value is estimated via SNIPS and is plotted as a function of horizon. BFTS achieves the highest estimated reward consistently along the horizon.
  • Figure 4: Simulation benchmarks: cumulative regret trajectories across all synthetic scenarios (mean $\pm$ SD over $R=12$ replications) up to $T=10{,}000$.
  • Figure 5: Simulation: posterior uncertainty diagnostics (coverage and calibration).
  • ...and 8 more figures

Theorems & Definitions (26)

  • Theorem 1: Bayesian regret of BFTS
  • proof
  • Lemma 2: Information gain of Bayesian forests
  • Remark 1: Why use arm-wise (separate) models?
  • Remark 2
  • proof
  • Lemma 3: Information gain under arm-wise BART prior
  • proof
  • Lemma 4: Posterior factorization under adaptive sampling
  • Remark 3
  • ...and 16 more