How Good is a Single Basin?
Kai Lion, Lorenzo Noci, Thomas Hofmann, Gregor Bachmann
TL;DR
The paper questions whether ensembles drawn from a single loss basin can match the predictive performance and calibration of traditional deep ensembles that sample across multiple basins. It systematically builds connected ensembles within one basin using methods like SWE and constrained training, and shows that increased connectivity often reduces diversity unless cross-basin information is incorporated. By exploring permutation-based alignment and, more effectively, distillation from multi-basin ensembles, the study demonstrates that much of the information from other basins can be re-discovered inside a single basin, yielding competitive or near-parity performance with deep ensembles. The findings imply that the loss landscape contains substantial cross-basin knowledge and motivate distillation-based strategies to harness it without leaving a basin, with implications for efficiency and architecture-dependent behavior (e.g., ViTs).
Abstract
The multi-modal nature of neural loss landscapes is often considered to be the main driver behind the empirical success of deep ensembles. In this work, we probe this belief by constructing various "connected" ensembles which are restricted to lie in the same basin. Through our experiments, we demonstrate that increased connectivity indeed negatively impacts performance. However, when incorporating the knowledge from other basins implicitly through distillation, we show that the gap in performance can be mitigated by re-discovering (multi-basin) deep ensembles within a single basin. Thus, we conjecture that while the extra-basin knowledge is at least partially present in any given basin, it cannot be easily harnessed without learning it from other basins.
