Statistical Advantages of Oblique Randomized Decision Trees and Forests

Eliza O'Reilly

Statistical Advantages of Oblique Randomized Decision Trees and Forests

Eliza O'Reilly

TL;DR

This paper develops oblique Mondrian tree and forest estimators, where splits occur along linear combinations of covariates via a STIT/Mondrian partitioning framework. It demonstrates that oblique splits enable adaptation to multi-index ridge function models, deriving risk and convergence bounds that depend on how well the split directions approximate the relevant subspace S and the geometry of the associated zonoid Π. The results show minimax-rate convergence under appropriate decay of the approximation error and establish suboptimality for axis-aligned Mondrian trees in general ridge-function settings, highlighting the statistical advantage of oblique splits for dimensionality reduction. By connecting oblique Mondrian processes to linear transformations, the work provides a theoretical basis for leveraging feature directions to combat the curse of dimensionality in regression while outlining practical directions for estimating the relevant subspace. Overall, the study offers a rigorous, geometry-driven justification for oblique, low-dimensional partitions in randomized forest methods and sets the stage for further development of data-driven direction learning.

Abstract

This work studies the statistical implications of using features comprised of general linear combinations of covariates to partition the data in randomized decision tree and forest regression algorithms. Using random tessellation theory in stochastic geometry, we provide a theoretical analysis of a class of efficiently generated random tree and forest estimators that allow for oblique splits along such features. We call these estimators \emph{oblique Mondrian} trees and forests, as the trees are generated by first selecting a set of features from linear combinations of the covariates and then running a Mondrian process that hierarchically partitions the data along these features. Generalization error bounds and convergence rates are obtained for the flexible function class of multi-index models for dimension reduction, where the output is assumed to depend on a low-dimensional relevant feature subspace of the input domain. The results highlight how the risk of these estimators depends on the choice of features and quantify how robust the risk is with respect to error in the estimation of relevant features. The asymptotic analysis also provides conditions on the consistency rates of the estimated features along which the data is split for these estimators to obtain minimax optimal rates of convergence with respect to the dimension of the relevant feature subspace. Additionally, a lower bound on the risk of axis-aligned Mondrian trees (where features are restricted to the set of covariates) is obtained, proving that these estimators are suboptimal for general ridge functions, no matter how the distribution over the covariates used to divide the data at each tree node is weighted.

Statistical Advantages of Oblique Randomized Decision Trees and Forests

TL;DR

Abstract

Paper Structure (23 sections, 19 theorems, 172 equations, 2 figures)

This paper contains 23 sections, 19 theorems, 172 equations, 2 figures.

Introduction
Outline
Background
Stable Under Iteration (STIT) Tessellations
Cells of stationary random tessellations
Associated zonoid
Intrinsic Volumes and Mixed Volumes
Regression Setting and Risk Bounds
Risk Bounds for Ridge Functions
Convergence Rates for Oblique Mondrian Trees and Forests
Risk Bounds for Weighted Mondrian Forests
Suboptimality of Mondrian trees for estimating ridge functions
Oblique Mondrian Processes
Conclusion
Selected Proofs
...and 8 more sections

Key Result

Theorem 6

Assume $\mathrm{supp}(\mu) \subseteq B^d$ and $f$ satisfies e:ridge_fxn_assump with $\tilde{g} \in \mathcal{C}^{0,\beta}(L)$ for some $L > 0$ and subspace $S$ of dimension $s \leq d$. Let $\hat{f}_{n} = \hat{f}_{n, M, \lambda, \Pi}$ be a random tessellation forest estimator with normalized associate

Figures (2)

Figure 1: An illustration of (a) a weighted Mondrian process with its associated zonoid $\Pi$ as in Example \ref{['ex:PI_Mondrian']} and (b) an oblique Mondrian process and its associated zonoid $\Pi$ as in Example \ref{['ex:PI_oblique_Mondrian']}.
Figure 2: Illustration of an associated zonoid and corresponding STIT tessellation in relation to a relevant feature subspace $S$. If the projection of $\Pi$ onto $S^{\perp}$ is small, then $S$ is cut more frequently by the boundaries of the STIT tessellation for a given lifetime.

Theorems & Definitions (43)

Definition 1
Example 2
Example 3
Example 4
Definition 5
Theorem 6
Theorem 7
Theorem 8
Corollary 9
Theorem 10
...and 33 more

Statistical Advantages of Oblique Randomized Decision Trees and Forests

TL;DR

Abstract

Statistical Advantages of Oblique Randomized Decision Trees and Forests

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (43)