Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles
Benjamin S. Ruben, Cengiz Pehlevan
TL;DR
This work develops a theory of feature-subsampled ridge ensembles under noisy, correlated data by applying the replica trick to derive analytical learning curves. It shows that subsampling can shift the double-descent peak for a linear predictor and introduces heterogeneous connectivity as a practical, scalable mitigation that remains effective in image-feature contexts. The authors further characterize an ensembling–subsampling trade-off under resource constraints, revealing phase-like regimes that determine optimal ensemble size and regularization. The results provide actionable guidance for designing robust, feature-subsampled ensembles in noisy, high-dimensional settings and demonstrate qualitative relevance to deep-feature based classification tasks.
Abstract
Feature bagging is a well-established ensembling method which aims to reduce prediction variance by combining predictions of many estimators trained on subsets or projections of features. Here, we develop a theory of feature-bagging in noisy least-squares ridge ensembles and simplify the resulting learning curves in the special case of equicorrelated data. Using analytical learning curves, we demonstrate that subsampling shifts the double-descent peak of a linear predictor. This leads us to introduce heterogeneous feature ensembling, with estimators built on varying numbers of feature dimensions, as a computationally efficient method to mitigate double-descent. Then, we compare the performance of a feature-subsampling ensemble to a single linear predictor, describing a trade-off between noise amplification due to subsampling and noise reduction due to ensembling. Our qualitative insights carry over to linear classifiers applied to image classification tasks with realistic datasets constructed using a state-of-the-art deep learning feature map.
