Table of Contents
Fetching ...

Grafting: Making Random Forests Consistent

Nicholas Waltz

TL;DR

The paper addresses the inconsistency of canonical Random Forests by grafting consistent leaf estimators onto a shallow CART, formalizing $L^2$ consistency for the resulting ensemble. It introduces Algorithms (C) and (C*) and proves a bias-variance decomposition and overall consistency under sparsity, with parameters $a_n$, $q_n$, and $\alpha_n$ guiding the trade-offs. Empirically, grafted trees perform on par with Breiman RF and outperform Centered Forests on standard benchmarks, and feature-selection properties via the CART step demonstrate robustness in high dimensions. The work suggests grafted trees as a viable, theoretically grounded alternative for high-dimensional predictive modeling with potential causal-inference benefits.

Abstract

Despite their performance and widespread use, little is known about the theory of Random Forests. A major unanswered question is whether, or when, the Random Forest algorithm is consistent. The literature explores various variants of the classic Random Forest algorithm to address this question and known short-comings of the method. This paper is a contribution to this literature. Specifically, the suitability of grafting consistent estimators onto a shallow CART is explored. It is shown that this approach has a consistency guarantee and performs well in empirical settings.

Grafting: Making Random Forests Consistent

TL;DR

The paper addresses the inconsistency of canonical Random Forests by grafting consistent leaf estimators onto a shallow CART, formalizing consistency for the resulting ensemble. It introduces Algorithms (C) and (C*) and proves a bias-variance decomposition and overall consistency under sparsity, with parameters , , and guiding the trade-offs. Empirically, grafted trees perform on par with Breiman RF and outperform Centered Forests on standard benchmarks, and feature-selection properties via the CART step demonstrate robustness in high dimensions. The work suggests grafted trees as a viable, theoretically grounded alternative for high-dimensional predictive modeling with potential causal-inference benefits.

Abstract

Despite their performance and widespread use, little is known about the theory of Random Forests. A major unanswered question is whether, or when, the Random Forest algorithm is consistent. The literature explores various variants of the classic Random Forest algorithm to address this question and known short-comings of the method. This paper is a contribution to this literature. Specifically, the suitability of grafting consistent estimators onto a shallow CART is explored. It is shown that this approach has a consistency guarantee and performs well in empirical settings.
Paper Structure (16 sections, 34 equations, 10 figures)

This paper contains 16 sections, 34 equations, 10 figures.

Figures (10)

  • Figure 1: Contours of the Biau CEF, the Random Forest estimate, and the estimate of the variant studied in this paper. The bottom-left and top-right squares contain infinitely many stripes of decreasing width, so the CART algorithm will never split the checkerboard pattern in the middle square.
  • Figure 2: Plots for $m(x) = 100 \sin(200 x^{(1)}x^{(2)})$
  • Figure 3: Plots for $m(x) = 100 (x^{(1)})^4$
  • Figure 4: Plots for $m(x) = \cos(30(x^{(3)})^3)$
  • Figure 5: Plots for $m(x) = \cos(200x^{(1)} + x^{(2)})+x^{(3)}$
  • ...and 5 more figures