Online Learning of Decision Trees with Thompson Sampling

Ayman Chaouki; Jesse Read; Albert Bifet

Online Learning of Decision Trees with Thompson Sampling

Ayman Chaouki, Jesse Read, Albert Bifet

TL;DR

This work tackles online learning of optimal Decision Trees from streaming categorical data by formulating DT construction as an undiscounted episodic MDP and solving it with a Thompson Sampling-based Monte Carlo Tree Search (TSDT). The approach maintains Bayesian posteriors at leaves and internal nodes to estimate the value of partial trees, with two variants—TSDT (Clark's approximation) and Fast-TSDT (efficient backprop using the best-mean child)—and proves almost-sure convergence to the optimal online DT. Empirically, TSDT and especially Fast-TSDT outperform online greedy methods like VFDT/EFDT and achieve comparable or superior performance to batch-optimal DT algorithms on benchmarks, while handling data streams without full batch reprocessing. The method enables principled online optimization with interpretable trees and scalable performance, offering a pathway to finite-time guarantees and extensions to alternative MCTS policies in future work.

Abstract

Decision Trees are prominent prediction models for interpretable Machine Learning. They have been thoroughly researched, mostly in the batch setting with a fixed labelled dataset, leading to popular algorithms such as C4.5, ID3 and CART. Unfortunately, these methods are of heuristic nature, they rely on greedy splits offering no guarantees of global optimality and often leading to unnecessarily complex and hard-to-interpret Decision Trees. Recent breakthroughs addressed this suboptimality issue in the batch setting, but no such work has considered the online setting with data arriving in a stream. To this end, we devise a new Monte Carlo Tree Search algorithm, Thompson Sampling Decision Trees (TSDT), able to produce optimal Decision Trees in an online setting. We analyse our algorithm and prove its almost sure convergence to the optimal tree. Furthermore, we conduct extensive experiments to validate our findings empirically. The proposed TSDT outperforms existing algorithms on several benchmarks, all while presenting the practical advantage of being tailored to the online setting.

Online Learning of Decision Trees with Thompson Sampling

TL;DR

Abstract

Paper Structure (18 sections, 8 theorems, 98 equations, 18 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 8 theorems, 98 equations, 18 figures, 1 table, 1 algorithm.

INTRODUCTION
RELATED WORK
PROBLEM FORMULATION
Markov Decision Process (MDP)
Tree representation of the State-Action Space
TSDT
Estimating $\mathcal{V}^{\pi^*}\left( \overline{T}\right)$ for a Search Leaf $\overline{T}$
Estimating $\mathcal{V}^{\pi^*}\left( T\right)$ for an internal Search Node $T$
The Algorithm
Estimator $\hat{p}\left( l\right)$
EXPERIMENTS
CONCLUSIONS, LIMITATIONS AND FUTURE WORK
Table of Notations
Experiments
Synthetic Experiment:
...and 3 more sections

Key Result

Theorem 1

Let time $t$ denote the number of iterations of TSDT and Fast-TSDT, then any Search Node $T$ satisfies the following: and any internal Search Node $T$ satisfies:

Figures (18)

Figure 1: Each Search Node is a state and each edge an action. The left-most edge is the terminal action, hence why both the parent and child Search Nodes represent the same DT. The remaining edges are split actions with respect to the black leaf.
Figure 2: One iteration of TSDT. The Search Node in dashed lines is the Search Leaf $T^{\left( N\right)} = \overline{T^{\left( N-1\right)}}$.
Figure 3: The Weights Degeneracy phenomenon.
Figure 4: Comparison of VFDT, EFDT, TSDT and Fast-TSDT. Left: Frequency of perfect convergence; Right: Average running time in seconds.
Figure 5: Cross-validation test accuracy comparison between TSDT, Fast-TSDT, OSDT and DL8.5 as a function of the number of leaves.
...and 13 more figures

Theorems & Definitions (16)

Theorem 1
Theorem 2
Lemma 3
proof : Proof of Lemma \ref{['lemma:conv-search-leaves']}
Lemma 4
proof : Proof of Lemma \ref{['lemma:visits-conv']}
Corollary 5
proof : Proof of Corollary \ref{['cor:visits-conv']}
Lemma 6
proof : Proof of Lemma \ref{['lemma:proba-opt']}
...and 6 more

Online Learning of Decision Trees with Thompson Sampling

TL;DR

Abstract

Online Learning of Decision Trees with Thompson Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (16)