Table of Contents
Fetching ...

Enhance Learning Efficiency of Oblique Decision Tree via Feature Concatenation

Shen-Huan Lyu, Yi-Xiao He, Yanyan Wang, Zhihao Qu, Bin Tang, Baoliu Ye

TL;DR

This work addresses the inefficiency of Oblique Decision Trees (ODT) arising from non-transmitted projection information along decision paths. It introduces FC-ODT, a single-tree method that uses feature concatenation to carry parent projections to child nodes, enabling in-model feature transformation and ridge-based oblique splits. Theoretical results show a faster consistency rate with tree depth $K$, achieving an $O(1/K^2)$ excess-risk decay, and empirical results on simulated and LIBSVM datasets demonstrate improved performance for shallow trees with modest computational overhead. The approach offers practical gains in efficiency and generalization for high-dimensional data and suggests avenues for extending FC-ODT ideas to random forests.

Abstract

Oblique Decision Tree (ODT) separates the feature space by linear projections, as opposed to the conventional Decision Tree (DT) that forces axis-parallel splits. ODT has been proven to have a stronger representation ability than DT, as it provides a way to create shallower tree structures while still approximating complex decision boundaries. However, its learning efficiency is still insufficient, since the linear projections cannot be transmitted to the child nodes, resulting in a waste of model parameters. In this work, we propose an enhanced ODT method with Feature Concatenation (\texttt{FC-ODT}), which enables in-model feature transformation to transmit the projections along the decision paths. Theoretically, we prove that our method enjoys a faster consistency rate w.r.t. the tree depth, indicating that our method possesses a significant advantage in generalization performance, especially for shallow trees. Experiments show that \texttt{FC-ODT} can outperform the other state-of-the-art decision trees with a limited tree depth.

Enhance Learning Efficiency of Oblique Decision Tree via Feature Concatenation

TL;DR

This work addresses the inefficiency of Oblique Decision Trees (ODT) arising from non-transmitted projection information along decision paths. It introduces FC-ODT, a single-tree method that uses feature concatenation to carry parent projections to child nodes, enabling in-model feature transformation and ridge-based oblique splits. Theoretical results show a faster consistency rate with tree depth , achieving an excess-risk decay, and empirical results on simulated and LIBSVM datasets demonstrate improved performance for shallow trees with modest computational overhead. The approach offers practical gains in efficiency and generalization for high-dimensional data and suggests avenues for extending FC-ODT ideas to random forests.

Abstract

Oblique Decision Tree (ODT) separates the feature space by linear projections, as opposed to the conventional Decision Tree (DT) that forces axis-parallel splits. ODT has been proven to have a stronger representation ability than DT, as it provides a way to create shallower tree structures while still approximating complex decision boundaries. However, its learning efficiency is still insufficient, since the linear projections cannot be transmitted to the child nodes, resulting in a waste of model parameters. In this work, we propose an enhanced ODT method with Feature Concatenation (\texttt{FC-ODT}), which enables in-model feature transformation to transmit the projections along the decision paths. Theoretically, we prove that our method enjoys a faster consistency rate w.r.t. the tree depth, indicating that our method possesses a significant advantage in generalization performance, especially for shallow trees. Experiments show that \texttt{FC-ODT} can outperform the other state-of-the-art decision trees with a limited tree depth.

Paper Structure

This paper contains 34 sections, 2 theorems, 40 equations, 4 figures, 2 tables, 3 algorithms.

Key Result

Lemma 1

If $T$ denotes a decision tree constructed by FC-ODT method, then its output eq:tree_estimation admits the following orthogonal expansion where ${\bm \psi}_t=(\psi_t({\boldsymbol{x}}_1),\dots,\psi_t({\boldsymbol{x}}_n))^\top$ is defined in Definition def:orthonormal. By construction, $\|{\bm \psi}_t\|=1$ and $\langle{\bm \psi}_t,{\bm \psi}_{t'}\rangle=0$ are satisfied for distinct internal nodes

Figures (4)

  • Figure 1: Illustration of our FC-ODT framework, where $[x,y]$ denotes the feature concatenation between $x$ and $y$.
  • Figure 2: MSE values with different maximum tree depths.
  • Figure 3: MSE values with different numbers of training samples.
  • Figure 4: Running time on ten datasets.

Theorems & Definitions (7)

  • Definition 1: Orthonormal decision stumps
  • Lemma 1: Orthogonal tree expansion
  • Remark 2
  • Definition 3: Ridge expansions cattaneo2022convergence
  • Definition 4: Total variation norm in node $t$
  • Theorem 5: Consistency rate for FC-ODT
  • Remark 6