Boosting-Based Sequential Meta-Tree Ensemble Construction for Improved Decision Trees

Ryota Maniwa; Naoki Ichijo; Yuta Nakahara; Toshiyasu Matsushima

Boosting-Based Sequential Meta-Tree Ensemble Construction for Improved Decision Trees

Ryota Maniwa, Naoki Ichijo, Yuta Nakahara, Toshiyasu Matsushima

TL;DR

This work tackles overfitting in decision trees by adopting meta-trees within a Bayesian framework to obtain Bayes-optimal predictions. It extends to boosting ensembles of meta-trees, constructing them sequentially by minimizing residuals and leveraging either GBDT-like residual learning or posterior-based weighting over meta-trees, with shrinkage as a regularizer. The authors introduce several variants, including MT_gbdt, MT_uni-uni, MT_uni-pos, and MT_pos-pos, and demonstrate through synthetic and benchmark experiments that ensembles of meta-trees can achieve lower Bayes risk and better generalization than traditional ensembles like GBDT and LightGBM, particularly when allowing deeper trees. The practical impact is improved predictive performance and robustness to overfitting in tree-based models, with a flexible framework for weighting meta-trees via uniform or posterior distributions over the explanatory features and their thresholds.

Abstract

A decision tree is one of the most popular approaches in machine learning fields. However, it suffers from the problem of overfitting caused by overly deepened trees. Then, a meta-tree is recently proposed. It solves the problem of overfitting caused by overly deepened trees. Moreover, the meta-tree guarantees statistical optimality based on Bayes decision theory. Therefore, the meta-tree is expected to perform better than the decision tree. In contrast to a single decision tree, it is known that ensembles of decision trees, which are typically constructed boosting algorithms, are more effective in improving predictive performance. Thus, it is expected that ensembles of meta-trees are more effective in improving predictive performance than a single meta-tree, and there are no previous studies that construct multiple meta-trees in boosting. Therefore, in this study, we propose a method to construct multiple meta-trees using a boosting approach. Through experiments with synthetic and benchmark datasets, we conduct a performance comparison between the proposed methods and the conventional methods using ensembles of decision trees. Furthermore, while ensembles of decision trees can cause overfitting as well as a single decision tree, experiments confirmed that ensembles of meta-trees can prevent overfitting due to the tree depth.

Boosting-Based Sequential Meta-Tree Ensemble Construction for Improved Decision Trees

TL;DR

Abstract

Paper Structure (24 sections, 4 theorems, 24 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 24 sections, 4 theorems, 24 equations, 7 figures, 2 tables, 1 algorithm.

Introduction
Preliminaries
Decision tree as a stochastic model
Prediction with a single meta-tree
Problem setup
The Bayes optimal prediction using a meta-tree
Prediction with ensembles of $B$ meta-trees
Problem setup
Sequential construction of meta-trees
Evaluation function for meta-trees
Prediction of $F_b(\bm{x})$ and $F_B(\bm{x})$
Model based on GBDT
Model with weights as probability distribution
A uniform distribution
A posterior distribution of $\bm{k}$
...and 9 more sections

Key Result

Theorem 1

Under the assumption of squared error loss, the optimal decision function $\delta^*(\bm{x}^n,y^n,\bm{x}_{n+1},\bm{k})$ that minimizes $(eq:Bayes_risk_function)$ is given as follows:

Figures (7)

Figure 1: The notations for the binary model tree (left) and an example of the model tree (right). The subscript of $\bm{x}$ (red) represents the feature $k_s$, and if $x_{k_s}$ is a continuous variable, it is divided by a threshold value $t_{k_s}$. If $x_{k_s}$ is a binary variable, it is divided by a binary value of 0 or 1. If $\bm{x}$ is assigned to the root node of the model tree (right), following the red path leads to the leaf node $s_{01}$. The output is generated from $p(y|\theta_{s_{01}})$.
Figure 2: An example of the prior distribution on $\{T_0,T_1,T_2,T_3,T_4\}\in\mathcal{T}$.
Figure 3: An example of the meta-tree $\mathrm{M}_{T,\bm{k}}(\bm{k}=((k_1,t_{k_1}),(k_2,t_{k_2}),(k_3,t_{k_3})))$. The meta-tree represents the model tree candidate set enclosed in blue, and the largest model tree within the set (enclosed in red) is called the representative tree.
Figure 4: The result of Experiment 1
Figure 5: $D_{\mathrm{max}}^*=3$
...and 2 more figures

Theorems & Definitions (17)

Definition 1
Definition 2
Example 1
Theorem 1
Remark 1
Remark 2
Definition 3
Definition 4
Remark 3
Remark 4
...and 7 more

Boosting-Based Sequential Meta-Tree Ensemble Construction for Improved Decision Trees

TL;DR

Abstract

Boosting-Based Sequential Meta-Tree Ensemble Construction for Improved Decision Trees

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (17)