Table of Contents
Fetching ...

Decision Tree Embedding by Leaf-Means

Cencheng Shen, Yuexiao Dong, Carey E. Priebe

TL;DR

Decision Tree Embedding (DTE) addresses the variance and interpretability trade-offs of standard trees and forests by turning leaf-region means into a data-driven anchor-based embedding. A single tree (or a small ensemble) yields an embedding Z that, when combined with a linear classifier like LDA, achieves competitive accuracy with substantially reduced training time compared to random forests and shallow neural networks. The authors prove population-level guarantees: (i) conditional label distributions are preserved up to an epsilon bound under Bayes-homogeneous partitions, and (ii) a population-classification error bound for the induced embedding. Empirically, DTE demonstrates strong performance on synthetic and real datasets, offering a scalable, interpretable alternative to traditional tree ensembles with favorable accuracy-time trade-offs.

Abstract

Decision trees and random forest remain highly competitive for classification on medium-sized, standard datasets due to their robustness, minimal preprocessing requirements, and interpretability. However, a single tree suffers from high estimation variance, while large ensembles reduce this variance at the cost of substantial computational overhead and diminished interpretability. In this paper, we propose Decision Tree Embedding (DTE), a fast and effective method that leverages the leaf partitions of a trained classification tree to construct an interpretable feature representation. By using the sample means within each leaf region as anchor points, DTE maps inputs into an embedding space defined by the tree's partition structure, effectively circumventing the high variance inherent in decision-tree splitting rules. We further introduce an ensemble extension based on additional bootstrap trees, and pair the resulting embedding with linear discriminant analysis for classification. We establish several population-level theoretical properties of DTE, including its preservation of conditional density under mild conditions and a characterization of the resulting classification error. Empirical studies on synthetic and real datasets demonstrate that DTE strikes a strong balance between accuracy and computational efficiency, outperforming or matching random forest and shallow neural networks while requiring only a fraction of their training time in most cases. Overall, the proposed DTE method can be viewed either as a scalable decision tree classifier that improves upon standard split rules, or as a neural network model whose weights are learned from tree-derived anchor points, achieving an intriguing integration of both paradigms.

Decision Tree Embedding by Leaf-Means

TL;DR

Decision Tree Embedding (DTE) addresses the variance and interpretability trade-offs of standard trees and forests by turning leaf-region means into a data-driven anchor-based embedding. A single tree (or a small ensemble) yields an embedding Z that, when combined with a linear classifier like LDA, achieves competitive accuracy with substantially reduced training time compared to random forests and shallow neural networks. The authors prove population-level guarantees: (i) conditional label distributions are preserved up to an epsilon bound under Bayes-homogeneous partitions, and (ii) a population-classification error bound for the induced embedding. Empirically, DTE demonstrates strong performance on synthetic and real datasets, offering a scalable, interpretable alternative to traditional tree ensembles with favorable accuracy-time trade-offs.

Abstract

Decision trees and random forest remain highly competitive for classification on medium-sized, standard datasets due to their robustness, minimal preprocessing requirements, and interpretability. However, a single tree suffers from high estimation variance, while large ensembles reduce this variance at the cost of substantial computational overhead and diminished interpretability. In this paper, we propose Decision Tree Embedding (DTE), a fast and effective method that leverages the leaf partitions of a trained classification tree to construct an interpretable feature representation. By using the sample means within each leaf region as anchor points, DTE maps inputs into an embedding space defined by the tree's partition structure, effectively circumventing the high variance inherent in decision-tree splitting rules. We further introduce an ensemble extension based on additional bootstrap trees, and pair the resulting embedding with linear discriminant analysis for classification. We establish several population-level theoretical properties of DTE, including its preservation of conditional density under mild conditions and a characterization of the resulting classification error. Empirical studies on synthetic and real datasets demonstrate that DTE strikes a strong balance between accuracy and computational efficiency, outperforming or matching random forest and shallow neural networks while requiring only a fraction of their training time in most cases. Overall, the proposed DTE method can be viewed either as a scalable decision tree classifier that improves upon standard split rules, or as a neural network model whose weights are learned from tree-derived anchor points, achieving an intriguing integration of both paradigms.

Paper Structure

This paper contains 15 sections, 4 theorems, 36 equations, 2 figures, 1 table.

Key Result

Theorem 1

Let $(X,Y)$ be a random pair with $X\in\mathbb{R}^p$ and $Y\in [K]$. Let $\{\mathcal{R}_1,\ldots,\mathcal{R}_m\}$ be the set of leaf regions from a classification tree, and let $Z=X\mathbf{W}^\top+\mathbf{1}\mathbf{b}^\top$ denote its Decision Tree Embedding, where If the partition $\{\mathcal{R}_j\}$ is $\varepsilon$-Bayes-homogeneous, then In particular, when $\varepsilon=0$, the DTE embedding

Figures (2)

  • Figure 1: Top left: simulated training data in the original two-dimensional space. The two clusters of class 1 are shown in light and dark blue, respectively, while class 2 is shown in red. Empirical cluster means are indicated by bolded markers in matching colors. Top right: leaf assignments and leaf means of the fitted decision tree. Bottom left: the three-dimensional DTE embedding $\mathbf{Z}$ obtained using the tree-derived leaf means. Samples are colored by their true cluster labels. Bottom right: the oracle embedding $\mathbf{Z}^{*}$, obtained by replacing the leaf means with the true cluster means.
  • Figure 2: This figures shows the running time, in seconds, for each method and each dataset, including training and testing. We report the average running time for each dataset, averaged over all cross validation runs.

Theorems & Definitions (7)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • Theorem 2
  • proof
  • Theorem 2
  • proof