Snapshot Ensembles: Train 1, get M for free
Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E. Hopcroft, Kilian Q. Weinberger
TL;DR
Ensembling improves neural network generalization but is costly. The authors propose Snapshot Ensembling, which uses cyclic cosine learning rate cycles to drive a single model to multiple local minima, saving snapshots for an explicit ensemble without extra training cost. Across CIFAR, SVHN, Tiny ImageNet, and ImageNet using ResNet, DenseNet, and Wide-ResNet, the method yields consistent accuracy gains, with CIFAR-10 around 3.4% error and CIFAR-100 around 17.4%, and competitive ImageNet results (M=2). Analyses show the snapshots are diverse, contributing complementary predictions and justifying the approach.
Abstract
Ensembles of neural networks are known to be much more robust and accurate than individual networks. However, training multiple deep networks for model averaging is computationally expensive. In this paper, we propose a method to obtain the seemingly contradictory goal of ensembling multiple neural networks at no additional training cost. We achieve this goal by training a single neural network, converging to several local minima along its optimization path and saving the model parameters. To obtain repeated rapid convergence, we leverage recent work on cyclic learning rate schedules. The resulting technique, which we refer to as Snapshot Ensembling, is simple, yet surprisingly effective. We show in a series of experiments that our approach is compatible with diverse network architectures and learning tasks. It consistently yields lower error rates than state-of-the-art single models at no additional training cost, and compares favorably with traditional network ensembles. On CIFAR-10 and CIFAR-100 our DenseNet Snapshot Ensembles obtain error rates of 3.4% and 17.4% respectively.
