Table of Contents
Fetching ...

An Algorithmic Framework for Constructing Multiple Decision Trees by Evaluating Their Combination Performance Throughout the Construction Process

Keito Tajima, Naoki Ichijo, Yuta Nakahara, Toshiyasu Matsushima

TL;DR

The paper tackles improving ensemble predictions from multiple decision trees by evaluating the ensemble (tree-combined) performance during construction rather than after building all trees. It introduces a grow/select framework that builds $B$ trees simultaneously, expanding candidates with a Grow step and selecting optimal ensembles with a Select step, where the final prediction is the average over the selected trees: $\hat{y}(\bm{x}) = \frac{1}{|M|} \sum_{(T,\bm{k}) \in M} \hat{y}_{s_{T,\bm{k}}(\bm{x})}$. The ensemble loss is defined as $L(M) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}(\bm{x}_i))^2$, guiding the selection. Experiments on synthetic Friedman data and standard benchmarks show that evaluating tree-combined performance during construction (especially with Blocked Greedy Search) can outperform traditional RF in several settings, demonstrating improved robustness and accuracy. The framework is extensible to classification and allows various algorithmic choices for Grow/Select, offering a principled alternative to bagging and boosting.

Abstract

Predictions using a combination of decision trees are known to be effective in machine learning. Typical ideas for constructing a combination of decision trees for prediction are bagging and boosting. Bagging independently constructs decision trees without evaluating their combination performance and averages them afterward. Boosting constructs decision trees sequentially, only evaluating a combination performance of a new decision tree and the fixed past decision trees at each step. Therefore, neither method directly constructs nor evaluates a combination of decision trees for the final prediction. When the final prediction is based on a combination of decision trees, it is natural to evaluate the appropriateness of the combination when constructing them. In this study, we propose a new algorithmic framework that constructs decision trees simultaneously and evaluates their combination performance throughout the construction process. Our framework repeats two procedures. In the first procedure, we construct new candidates of combinations of decision trees to find a proper combination of decision trees. In the second procedure, we evaluate each combination performance of decision trees under some criteria and select a better combination. To confirm the performance of the proposed framework, we perform experiments on synthetic and benchmark data.

An Algorithmic Framework for Constructing Multiple Decision Trees by Evaluating Their Combination Performance Throughout the Construction Process

TL;DR

The paper tackles improving ensemble predictions from multiple decision trees by evaluating the ensemble (tree-combined) performance during construction rather than after building all trees. It introduces a grow/select framework that builds trees simultaneously, expanding candidates with a Grow step and selecting optimal ensembles with a Select step, where the final prediction is the average over the selected trees: . The ensemble loss is defined as , guiding the selection. Experiments on synthetic Friedman data and standard benchmarks show that evaluating tree-combined performance during construction (especially with Blocked Greedy Search) can outperform traditional RF in several settings, demonstrating improved robustness and accuracy. The framework is extensible to classification and allows various algorithmic choices for Grow/Select, offering a principled alternative to bagging and boosting.

Abstract

Predictions using a combination of decision trees are known to be effective in machine learning. Typical ideas for constructing a combination of decision trees for prediction are bagging and boosting. Bagging independently constructs decision trees without evaluating their combination performance and averages them afterward. Boosting constructs decision trees sequentially, only evaluating a combination performance of a new decision tree and the fixed past decision trees at each step. Therefore, neither method directly constructs nor evaluates a combination of decision trees for the final prediction. When the final prediction is based on a combination of decision trees, it is natural to evaluate the appropriateness of the combination when constructing them. In this study, we propose a new algorithmic framework that constructs decision trees simultaneously and evaluates their combination performance throughout the construction process. Our framework repeats two procedures. In the first procedure, we construct new candidates of combinations of decision trees to find a proper combination of decision trees. In the second procedure, we evaluate each combination performance of decision trees under some criteria and select a better combination. To confirm the performance of the proposed framework, we perform experiments on synthetic and benchmark data.
Paper Structure (15 sections, 10 equations, 5 figures, 4 tables, 4 algorithms)

This paper contains 15 sections, 10 equations, 5 figures, 4 tables, 4 algorithms.

Figures (5)

  • Figure 1: An example of the notations. In this figure, the parameters are the following: $D_{\text{max}}=2$, $\bm{k}=((3,3),(1,\frac{3}{2}),(2,\frac{1}{3}))$, $\mathcal{I}_T=\{s_\lambda,s_0,s_1\}$, $\mathcal{L}_T=\{s_{00},s_{01},s_{10},s_{11}\}$ (painted gray), and $s_{T,\bm{k}}(\bm{x})=s_{00}$.
  • Figure 2: An example of the basic algorithm ($B=3$). In this figure, "grow" and "select" are performed twice each. We repeat these operations to construct $B$ decision trees for the final prediction.
  • Figure 3: An example of "grow." In this figure, the parameters are the following: $m_{\text{leaf}}=2$, $B_{\text{keep}}=3$. Therefore, we determine the top two leaf nodes that have the largest $h(s)$, make new trees, and keep the top three trees that have the largest gain function.
  • Figure 4: An example of the greedy search. In this figure, the parameters are the following: $B=3$, $C=2$. First, we evaluate the performance of the prediction by a single decision tree with the evaluation function and keep the top two decision trees with the lowest evaluation function. Next, we consider "tree-combined prediction," combining the first tree with the newly added tree. Then, we evaluate the performance of each "tree-combined prediction" and keep the top two combinations with the lowest evaluation function. In this example, we have four combinations because we keep the top two trees first and each tree is combined with the new tree, but we keep only the two with the lowest evaluation function (painted gray). Finally, we consider "tree-combined prediction," combining the selected two trees with the newly added tree. Then, we evaluate the performance of each "tree-combined prediction" and keep the combination with the lowest evaluation function (painted yellow).
  • Figure 5: An example of the blocked greedy search. In this figure, the parameters are the following: $B=3$, $C=2$. The difference for the greedy search is the search set used for adding a new tree. When adding a new tree, it is necessary to select a tree from the corresponding block.

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6