To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning
Ildus Sadrtdinov, Dmitrii Pozdeev, Dmitry Vetrov, Ekaterina Lobacheva
TL;DR
The paper addresses how to build high-quality ensembles in transfer learning when only a single pre-trained checkpoint is available. It analyzes local and semi-local ensembling methods and finds that exploring the pre-train basin with Snapshot Ensemble approaches helps but exiting the basin degrades transfer benefits. To resolve this, it introduces StarSSE, a parallel, star-shaped extension that preserves transfer advantages while yielding diverse, high-quality models; it also demonstrates strong model soups from StarSSE ensembles. Across medium and large-scale tasks, StarSSE consistently outperforms standard SSE and Local DE in ensembles and soups, with notable improvements on robustness and scalability. The work advances practical, compute-efficient ensemble methods for transfer learning and offers guidance for leveraging loss-landscape structure in checkpointed regimes.
Abstract
Transfer learning and ensembling are two popular techniques for improving the performance and robustness of neural networks. Due to the high cost of pre-training, ensembles of models fine-tuned from a single pre-trained checkpoint are often used in practice. Such models end up in the same basin of the loss landscape, which we call the pre-train basin, and thus have limited diversity. In this work, we show that ensembles trained from a single pre-trained checkpoint may be improved by better exploring the pre-train basin, however, leaving the basin results in losing the benefits of transfer learning and in degradation of the ensemble quality. Based on the analysis of existing exploration methods, we propose a more effective modification of the Snapshot Ensembles (SSE) for transfer learning setup, StarSSE, which results in stronger ensembles and uniform model soups.
