Bagging Provides Assumption-free Stability
Jake A. Soloff, Rina Foygel Barber, Rebecca Willett
TL;DR
This work answers the question of how stable bagging is when applied to an arbitrary base algorithm without distributional assumptions. It introduces average-case stability and proves a finite-sample guarantee for derandomized bagging (and variants) with bounded outputs, showing that stability holds whenever $\delta\varepsilon^2 \gtrsim \frac{1}{n}\cdot\frac{p}{1-p}$ (with refinements involving the resampling covariance $q$). The results extend to unbounded outputs via data-dependent scaling and adaptive clipping, establish tightness for subbagging, and demonstrate that worst-case stability cannot be guaranteed under the same conditions. Empirically, subbagging stabilizes highly unstable base learners across settings, supporting distribution-free uncertainty quantification and robust predictive intervals. Overall, the paper provides a principled, assumption-free stabilization mechanism for bagging with broad implications for generalization and inference.
Abstract
Bagging is an important technique for stabilizing machine learning models. In this paper, we derive a finite-sample guarantee on the stability of bagging for any model. Our result places no assumptions on the distribution of the data, on the properties of the base algorithm, or on the dimensionality of the covariates. Our guarantee applies to many variants of bagging and is optimal up to a constant. Empirical results validate our findings, showing that bagging successfully stabilizes even highly unstable base algorithms.
