Sparse Projection Oblique Randomer Forests

Tyler M. Tomita; James Browne; Cencheng Shen; Jaewon Chung; Jesse L. Patsolic; Benjamin Falk; Jason Yim; Carey E. Priebe; Randal Burns; Mauro Maggioni; Joshua T. Vogelstein

Sparse Projection Oblique Randomer Forests

Tyler M. Tomita, James Browne, Cencheng Shen, Jaewon Chung, Jesse L. Patsolic, Benjamin Falk, Jason Yim, Carey E. Priebe, Randal Burns, Mauro Maggioni, Joshua T. Vogelstein

TL;DR

Sparse Projection Oblique Randomer Forests (SPORF) introduces a decision forest that uses very sparse random projections to generate oblique splits, preserving axis-aligned forest benefits such as robustness, interpretability, and efficiency while enhancing accuracy over existing oblique methods. Through extensive simulations and 105 real-world benchmarks, SPORF demonstrates strong, consistent performance, robustness to hyperparameters and high-dimensional noise, and favorable scaling comparable to Random Forests. Theoretical analyses and empirical results show SPORF achieves competitive time and space complexity, with practical implementations in R and Python that enable parallelization and fast inference, including a Forest Packing acceleration for prediction. Overall, SPORF provides a scalable, interpretable, and accurate alternative to axis-aligned RF and existing oblique forests, with strong potential for integration into boosting and other ensemble frameworks.

Abstract

Decision forests, including Random Forests and Gradient Boosting Trees, have recently demonstrated state-of-the-art performance in a variety of machine learning settings. Decision forests are typically ensembles of axis-aligned decision trees; that is, trees that split only along feature dimensions. In contrast, many recent extensions to decision forests are based on axis-oblique splits. Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency. We introduce yet another decision forest, called "Sparse Projection Oblique Randomer Forests" (SPORF). SPORF uses very sparse random projections, i.e., linear combinations of a small subset of features. SPORF significantly improves accuracy over existing state-of-the-art algorithms on a standard benchmark suite for classification with >100 problems of varying dimension, sample size, and number of classes. To illustrate how SPORF addresses the limitations of both axis-aligned and existing oblique decision forest methods, we conduct extensive simulated experiments. SPORF typically yields improved performance over existing decision forests, while mitigating computational efficiency and scalability and maintaining interpretability. SPORF can easily be incorporated into other ensemble methods such as boosting to obtain potentially similar gains.

Sparse Projection Oblique Randomer Forests

TL;DR

Abstract

Sparse Projection Oblique Randomer Forests

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)