(Implicit) Ensembles of Ensembles: Epistemic Uncertainty Collapse in Large Models
Andreas Kirsch
TL;DR
The paper investigates a paradox in uncertainty quantification: as models and ensembles grow, epistemic uncertainty can collapse, undermining reliable reliability estimates. It develops a theoretical framework around ensembles of ensembles and the implicit ensembling hypothesis, with connections to Neural Tangent Kernel theory, and validates the phenomenon across toy tasks, MNIST, CIFAR-10, and large vision models, including ResNets and Vision Transformers. A key result shows that as sub-ensemble size $M$ increases, $ ext{MI}(Y;oldsymbol{\E}_I|oldsymbol{x})$ tends to zero, indicating vanishing disagreement between ensembles, while implicit ensemble extraction can recover much of the lost uncertainty from a single large model. The work suggests that naive scaling does not guarantee improved uncertainty estimates and offers practical techniques for recovering epistemic uncertainty, with significant implications for safety-critical applications and out-of-distribution detection.
Abstract
Epistemic uncertainty is crucial for safety-critical applications and data acquisition tasks. Yet, we find an important phenomenon in deep learning models: an epistemic uncertainty collapse as model complexity increases, challenging the assumption that larger models invariably offer better uncertainty quantification. We introduce implicit ensembling as a possible explanation for this phenomenon. To investigate this hypothesis, we provide theoretical analysis and experiments that demonstrate uncertainty collapse in explicit ensembles of ensembles and show experimental evidence of similar collapse in wider models across various architectures, from simple MLPs to state-of-the-art vision models including ResNets and Vision Transformers. We further develop implicit ensemble extraction techniques to decompose larger models into diverse sub-models, showing we can thus recover epistemic uncertainty. We explore the implications of these findings for uncertainty estimation.
