Online Learning of Decision Trees with Thompson Sampling
Ayman Chaouki, Jesse Read, Albert Bifet
TL;DR
This work tackles online learning of optimal Decision Trees from streaming categorical data by formulating DT construction as an undiscounted episodic MDP and solving it with a Thompson Sampling-based Monte Carlo Tree Search (TSDT). The approach maintains Bayesian posteriors at leaves and internal nodes to estimate the value of partial trees, with two variants—TSDT (Clark's approximation) and Fast-TSDT (efficient backprop using the best-mean child)—and proves almost-sure convergence to the optimal online DT. Empirically, TSDT and especially Fast-TSDT outperform online greedy methods like VFDT/EFDT and achieve comparable or superior performance to batch-optimal DT algorithms on benchmarks, while handling data streams without full batch reprocessing. The method enables principled online optimization with interpretable trees and scalable performance, offering a pathway to finite-time guarantees and extensions to alternative MCTS policies in future work.
Abstract
Decision Trees are prominent prediction models for interpretable Machine Learning. They have been thoroughly researched, mostly in the batch setting with a fixed labelled dataset, leading to popular algorithms such as C4.5, ID3 and CART. Unfortunately, these methods are of heuristic nature, they rely on greedy splits offering no guarantees of global optimality and often leading to unnecessarily complex and hard-to-interpret Decision Trees. Recent breakthroughs addressed this suboptimality issue in the batch setting, but no such work has considered the online setting with data arriving in a stream. To this end, we devise a new Monte Carlo Tree Search algorithm, Thompson Sampling Decision Trees (TSDT), able to produce optimal Decision Trees in an online setting. We analyse our algorithm and prove its almost sure convergence to the optimal tree. Furthermore, we conduct extensive experiments to validate our findings empirically. The proposed TSDT outperforms existing algorithms on several benchmarks, all while presenting the practical advantage of being tailored to the online setting.
