When does Subagging Work?

Christos Revelas; Otilia Boldea; Bas J. M. Werker

When does Subagging Work?

Christos Revelas, Otilia Boldea, Bas J. M. Werker

TL;DR

This work analyzes subagging for CART regression trees, establishing pointwise consistency under a bias-variance framework tied to cell diameter and occupancy, and comparing subagging to single trees across different sizes. It shows that subagging reduces variance while preserving bias when trees are grown with appropriate constraints, but a single optimally sized tree can outperform subagging if its subtrees are not well chosen. The authors provide practical guidance on tree sizing, propose a consistency-enforcing approach via minimum cell counts, and demonstrate robustness across subsample sizes and implementations. The results offer actionable insights for practitioners on when and how to employ subagging for regression trees to achieve stable, accurate predictions.

Abstract

We study the effectiveness of subagging, or subsample aggregating, on regression trees, a popular non-parametric method in machine learning. First, we give sufficient conditions for pointwise consistency of trees. We formalize that (i) the bias depends on the diameter of cells, hence trees with few splits tend to be biased, and (ii) the variance depends on the number of observations in cells, hence trees with many splits tend to have large variance. While these statements for bias and variance are known to hold globally in the covariate space, we show that, under some constraints, they are also true locally. Second, we compare the performance of subagging to that of trees across different numbers of splits. We find that (1) for any given number of splits, subagging improves upon a single tree, and (2) this improvement is larger for many splits than it is for few splits. However, (3) a single tree grown at optimal size can outperform subagging if the size of its individual trees is not optimally chosen. This last result goes against common practice of growing large randomized trees to eliminate bias and then averaging to reduce variance.

When does Subagging Work?

TL;DR

Abstract

Paper Structure (25 sections, 9 theorems, 52 equations, 11 figures)

This paper contains 25 sections, 9 theorems, 52 equations, 11 figures.

Introduction
Decision-Tree Methods for Regression
CART Criterion and the Location of Splits
Stopping Rules and Tree Size
Bias and Variance of Subagging
Pointwise Consistency of Trees and the Bias-Variance Trade-Off Associated with Tree Size
Subagging Consistent Trees
Subagging Small Trees
Analysis Conditionally on $D_n$
Statistical Bias and Variance
Subagging Large Trees
Optimal Number of Splits as a Function of the Dataset Size
Robustness
Subsample Size and Replacement
Readily Available Implementations
...and 10 more sections

Key Result

Proposition 1

The criterion (CART_criterion) can be re-written as

Figures (11)

Figure 1: (In)consistency of trees: on the $x$-axis is $n$; the solid (resp. dotted) line represents the sample mean (resp. mean $\pm$ one standard deviation) of the tree estimate for $f(x_0)$ (grey line) in each scenario a), b) and c).
Figure 2: Bias-variance trade-off associated with tree size.
Figure 3: Subagging consistent trees: on the $x$-axis is $n$; the solid (respectively dashed) line represents the squared bias (left plot), variance (middle) and mean squared error (right) of the tree (respectively subagged tree) estimates for $f(x_0)$ when consistently grown ($\alpha=0.65$).
Figure 4: First plot (left): stump (blue) and subagged stump (red) estimates as a function of $x_0$ for one realization of $D_n$ (gray points). In black is the true regression function $f(x_0)$. Second plot: weights $W_{n,i}(x_0)$ (stump, blue) and $W^*_{n,i}(x_0)$ (subagged stump, red) for $x_0=0.1$. Third plot: weights for $x_0=0.5$. Fourth plot: weights for $x_0=0.6$.
Figure 5: Squared bias, variance and mean squared error of a stump (blue) and subagged stump (red) as a function of $x_0$. The vertical line shows the theoretical split ($x=0.64$).
...and 6 more figures

Theorems & Definitions (9)

Proposition 1
Proposition 2
Proposition 3
Theorem 1
Lemma 1
Lemma 2
Lemma 3
Lemma 4
Lemma 5

When does Subagging Work?

TL;DR

Abstract

When does Subagging Work?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (9)