Table of Contents
Fetching ...

FoLDTree: A ULDA-Based Decision Tree Framework for Efficient Oblique Splits and Feature Selection

Siyu Wang, Kehui Yao

TL;DR

This work introduces LDATree and FoLDTree, two ULDA-based decision-tree frameworks designed to implement efficient oblique splits while robustly handling missing values and enabling feature selection. By embedding ULDA and Forward ULDA within a recursive tree structure, the methods produce multi-class, probability-enabled splits with strong predictive performance, approaching that of random forests on many datasets. Empirical results across simulations and real-world data demonstrate improved robustness to noise, capability to capture high-order interactions, and competitive accuracy relative to established oblique trees and orthogonal methods. The approaches offer practical benefits as robust single-tree alternatives and lay groundwork for future ensembles and SVM-inspired enhancements.

Abstract

Traditional decision trees are limited by axis-orthogonal splits, which can perform poorly when true decision boundaries are oblique. While oblique decision tree methods address this limitation, they often face high computational costs, difficulties with multi-class classification, and a lack of effective feature selection. In this paper, we introduce LDATree and FoLDTree, two novel frameworks that integrate Uncorrelated Linear Discriminant Analysis (ULDA) and Forward ULDA into a decision tree structure. These methods enable efficient oblique splits, handle missing values, support feature selection, and provide both class labels and probabilities as model outputs. Through evaluations on simulated and real-world datasets, LDATree and FoLDTree consistently outperform axis-orthogonal and other oblique decision tree methods, achieving accuracy levels comparable to the random forest. The results highlight the potential of these frameworks as robust alternatives to traditional single-tree methods.

FoLDTree: A ULDA-Based Decision Tree Framework for Efficient Oblique Splits and Feature Selection

TL;DR

This work introduces LDATree and FoLDTree, two ULDA-based decision-tree frameworks designed to implement efficient oblique splits while robustly handling missing values and enabling feature selection. By embedding ULDA and Forward ULDA within a recursive tree structure, the methods produce multi-class, probability-enabled splits with strong predictive performance, approaching that of random forests on many datasets. Empirical results across simulations and real-world data demonstrate improved robustness to noise, capability to capture high-order interactions, and competitive accuracy relative to established oblique trees and orthogonal methods. The approaches offer practical benefits as robust single-tree alternatives and lay groundwork for future ensembles and SVM-inspired enhancements.

Abstract

Traditional decision trees are limited by axis-orthogonal splits, which can perform poorly when true decision boundaries are oblique. While oblique decision tree methods address this limitation, they often face high computational costs, difficulties with multi-class classification, and a lack of effective feature selection. In this paper, we introduce LDATree and FoLDTree, two novel frameworks that integrate Uncorrelated Linear Discriminant Analysis (ULDA) and Forward ULDA into a decision tree structure. These methods enable efficient oblique splits, handle missing values, support feature selection, and provide both class labels and probabilities as model outputs. Through evaluations on simulated and real-world datasets, LDATree and FoLDTree consistently outperform axis-orthogonal and other oblique decision tree methods, achieving accuracy levels comparable to the random forest. The results highlight the potential of these frameworks as robust alternatives to traditional single-tree methods.

Paper Structure

This paper contains 17 sections, 4 theorems, 7 equations, 8 figures, 1 table.

Key Result

Lemma 1

For a numerical predictor $\mathbf{X}$ containing missing values, we impute with a constant $C$ and add the missing value indicator $X^{-} = I(X = \text{NA})$. Then, the column spaces of $\{\mathbf{X}, \mathbf{X^{-}}\}$ does not depend on the choice of $C$.

Figures (8)

  • Figure 1: Decision boundaries from the decision tree (rpart in R) when the decision boundary is not orthogonal to the axes (Section \ref{['sec:Introduction']}). (a) 2D decision boundary with the actual boundary represented by a straight line, and the staircase boundary fitted by rpart. The colored background represents the predicted regions. (b) 3D decision boundary, where the gray surface represents the actual boundary, and the red step-function surface is fitted by rpart
  • Figure 2: Decision boundaries from LDA. (Section \ref{['sec:Introduction']}). The colored background represents LDA prediction regions, and the labels indicate the class centroids. (a) LDA decision boundary before splitting. (b) LDA decision boundary after splitting on $x \leq 0$
  • Figure 3: Simulated scenario where LDA splitting fails and our approach to address it (Section \ref{['subsec:splitting']}). (a) Scatter plot showing class $A$ dominating class $B$. (b) Posterior probabilities from LDA, where class $A$'s dominance leads to no splits under the estimated priors. Red dots on the x-axis represent class centers. (c) Posterior probabilities from LDA with equal priors, allowing LDA to split at the intersection of the two densities. (d) First and second splits in LDATree, demonstrating effective class separation
  • Figure 4: Simulated scenario where the split strength $\alpha$ from CART may be misleading (Section \ref{['subsec:stoppingRule']}). In the tree plots, the numbers $N_{\text{mis}}/N_{\text{total}}$ next to the nodes indicate that out of $N_{\text{total}}$ samples, $N_{\text{mis}}$ were misclassified (training errors)
  • Figure 5: Illustration of the LDATree algorithm (Section \ref{['subsec:algorithmIllustration']}). The right column shows the original and predicted patterns, while the left column shows how the decision tree divides the sample space. The tree is read from top to bottom, with additional splits introduced at each level. The final row presents the post-pruning results
  • ...and 3 more figures

Theorems & Definitions (8)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Theorem 1
  • proof