Table of Contents
Fetching ...

Consistency of Random Forest Type Algorithms under a Probabilistic Impurity Decrease Condition

Ricardo Blum, Munir Hiabu, Enno Mammen, Joseph T. Meyer

TL;DR

A unifying theorem establishing consistency results for a broad class of tree-based algorithms, based on the extension of the recently introduced notion of sufficient impurity decrease to a probabilistic sufficient impurity decrease condition is derived.

Abstract

This paper derives a unifying theorem establishing consistency results for a broad class of tree-based algorithms. It improves current results in two aspects. First of all, it can be applied to algorithms that vary from traditional Random Forests due to additional randomness for choosing splits, extending split options, allowing partitions into more than two cells in a single iteration step, and combinations of those. In particular, we prove consistency for Extremely Randomized Trees, Interaction Forests and Oblique Regression Trees using our general theorem. Secondly, it can be used to demonstrate consistency for a larger function class compared to previous results on Random Forests if one allows for additional random splits. Our results are based on the extension of the recently introduced notion of sufficient impurity decrease to a probabilistic sufficient impurity decrease condition.

Consistency of Random Forest Type Algorithms under a Probabilistic Impurity Decrease Condition

TL;DR

A unifying theorem establishing consistency results for a broad class of tree-based algorithms, based on the extension of the recently introduced notion of sufficient impurity decrease to a probabilistic sufficient impurity decrease condition is derived.

Abstract

This paper derives a unifying theorem establishing consistency results for a broad class of tree-based algorithms. It improves current results in two aspects. First of all, it can be applied to algorithms that vary from traditional Random Forests due to additional randomness for choosing splits, extending split options, allowing partitions into more than two cells in a single iteration step, and combinations of those. In particular, we prove consistency for Extremely Randomized Trees, Interaction Forests and Oblique Regression Trees using our general theorem. Secondly, it can be used to demonstrate consistency for a larger function class compared to previous results on Random Forests if one allows for additional random splits. Our results are based on the extension of the recently introduced notion of sufficient impurity decrease to a probabilistic sufficient impurity decrease condition.
Paper Structure (28 sections, 16 theorems, 101 equations, 9 figures, 1 table)

This paper contains 28 sections, 16 theorems, 101 equations, 9 figures, 1 table.

Key Result

Theorem 7

Suppose that the probabilistic SID Condition cond:sid-general holds with $\delta \geq 1-L^{-1}$. Furthermore, assume Conditions cond:C2, cond:C3, cond:C4 and cond:T_dim, cond:T_boundary. Then, there exists $c > 0$, such that for $k = k_n < c\log(n)$ with $k\to \infty$,

Figures (9)

  • Figure 1: Illustration of $\rho(t,t')$ (blue parts) where $t=[0,1]^2$ and $t'$ is the area above the diagonal line.
  • Figure 2: Illustration of candidate splits in Interaction Forests. Adapted from interactionforests
  • Figure 3: Example of a partition corresponding to a tree in Interaction Forests.
  • Figure 4: Illustration of RSRF. Background trees illustrate other possible candidate partitions.
  • Figure 5: Illustration of the procedure used by RSRF for splitting a cell $t$ into $t_{11},t_{12},t_{21}, t_{22}$.
  • ...and 4 more figures

Theorems & Definitions (39)

  • Definition 1: Impurity decrease
  • Definition 2: Empirical impurity decrease
  • Definition 3: General tree estimator
  • Example 1: CART-split
  • Example 2: Example \ref{['def:sample-cart']} continued
  • Definition 4: Grid, see also Chi, Chi
  • Definition 5: $\#$-Operator
  • Definition 6
  • Theorem 7: Consistency for general tree estimators
  • Remark 8
  • ...and 29 more