Table of Contents
Fetching ...

Optimizing High-Dimensional Oblique Splits

Chien-Ming Chi

TL;DR

This work develops a theory and practical framework for optimizing high-dimensional $s$-sparse oblique splits under the Sufficient Impurity Decrease (SID) criterion. It introduces a progressive, transfer-learning-inspired scheme that iteratively grows oblique trees by reusing a limited set of splits, and then combines these splits with orthogonal splits in Random Forests (RF+$\mathcal{S}^{(b)}$). The authors establish non-asymptotic SID convergence rates for sparse oblique splits, reveal a fundamental trade-off between SID class size (via $s_0$) and computational cost (scaling with $\binom{p}{s_0}$), and provide memory-transfer results that enable efficient learning and early stopping. Empirically, the framework demonstrates capability to learn complex functions such as the $s_0$-dimensional XOR in high dimensions, and competitive performance on real-world datasets relative to Forest-RC and MORF baselines. The work also offers an open-source Python implementation, highlighting practical applicability for scalable oblique-tree-based prediction in tall data settings.

Abstract

Orthogonal-split trees perform well, but evidence suggests oblique splits can enhance their performance. This paper explores optimizing high-dimensional $s$-sparse oblique splits from $\{(\vec{w}, \vec{w}^{\top}\boldsymbol{X}_{i}) : i\in \{1,\dots, n\}, \vec{w} \in \mathbb{R}^p, \| \vec{w} \|_{2} = 1, \| \vec{w} \|_{0} \leq s \}$ for growing oblique trees, where $ s $ is a user-defined sparsity parameter. We establish a connection between SID convergence and $s_0$-sparse oblique splits with $s_0\ge 1$, showing that the SID function class expands as $s_0$ increases, enabling the capture of more complex data-generating functions such as the $s_0$-dimensional XOR function. Thus, $s_0$ represents the unknown potential complexity of the underlying data-generating function. Learning these complex functions requires an $s$-sparse oblique tree with $s \geq s_0$ and greater computational resources. This highlights a trade-off between statistical accuracy, governed by the SID function class size depending on $s_0$, and computational cost. In contrast, previous studies have explored the problem of SID convergence using orthogonal splits with $ s_0 = s = 1 $, where runtime was less critical. Additionally, we introduce a practical framework for oblique trees that integrates optimized oblique splits alongside orthogonal splits into random forests. The proposed approach is assessed through simulations and real-data experiments, comparing its performance against various oblique tree models.

Optimizing High-Dimensional Oblique Splits

TL;DR

This work develops a theory and practical framework for optimizing high-dimensional -sparse oblique splits under the Sufficient Impurity Decrease (SID) criterion. It introduces a progressive, transfer-learning-inspired scheme that iteratively grows oblique trees by reusing a limited set of splits, and then combines these splits with orthogonal splits in Random Forests (RF+). The authors establish non-asymptotic SID convergence rates for sparse oblique splits, reveal a fundamental trade-off between SID class size (via ) and computational cost (scaling with ), and provide memory-transfer results that enable efficient learning and early stopping. Empirically, the framework demonstrates capability to learn complex functions such as the -dimensional XOR in high dimensions, and competitive performance on real-world datasets relative to Forest-RC and MORF baselines. The work also offers an open-source Python implementation, highlighting practical applicability for scalable oblique-tree-based prediction in tall data settings.

Abstract

Orthogonal-split trees perform well, but evidence suggests oblique splits can enhance their performance. This paper explores optimizing high-dimensional -sparse oblique splits from for growing oblique trees, where is a user-defined sparsity parameter. We establish a connection between SID convergence and -sparse oblique splits with , showing that the SID function class expands as increases, enabling the capture of more complex data-generating functions such as the -dimensional XOR function. Thus, represents the unknown potential complexity of the underlying data-generating function. Learning these complex functions requires an -sparse oblique tree with and greater computational resources. This highlights a trade-off between statistical accuracy, governed by the SID function class size depending on , and computational cost. In contrast, previous studies have explored the problem of SID convergence using orthogonal splits with , where runtime was less critical. Additionally, we introduce a practical framework for oblique trees that integrates optimized oblique splits alongside orthogonal splits into random forests. The proposed approach is assessed through simulations and real-data experiments, comparing its performance against various oblique tree models.

Paper Structure

This paper contains 42 sections, 9 theorems, 271 equations, 9 tables.

Key Result

Proposition 1

Assume Condition regular.tree. Then, 1) For each $b\ge b_{1}$ and $h\in \{1, \dots, H\}$, the event $\Theta_{b_{1}, h} = \{ \widehat{N}_{q}^{(0)} \textnormal{ and } \widehat{N}_{q}^{(b_{1})} \textnormal{ are sample-equivalent for } q\in \{1,\dots, h\} \}$ is a subset of $\Theta_{b, h}$. 2) With pro

Theorems & Definitions (14)

  • Proposition 1
  • Theorem 2
  • Corollary 3
  • Example 1
  • Example 2
  • Example 3
  • Example 4
  • Example 5
  • Theorem 4
  • Lemma 5
  • ...and 4 more