FreezeOut: Accelerate Training by Progressively Freezing Layers
Andrew Brock, Theodore Lim, J. M. Ritchie, Nick Weston
TL;DR
FreezeOut addresses training inefficiency by progressively freezing early neural network layers and excluding them from backpropagation. It uses layer-specific cosine-annealed learning-rate schedules that decay to zero at layer-dependent milestones, effectively freezing layers as training progresses. Empirical results on CIFAR with DenseNet, WideResNet, and VGG show architecture-dependent benefits, including up to 20% training-time speedups (with modest accuracy loss for DenseNet and no loss for some ResNets) and limited gains for VGG. The authors provide practical defaults (cubic scheduling with LR scaling and t0 ≈ 0.512) and public PyTorch code, highlighting FreezeOut as a viable speedup for prototyping and resource-constrained training.
Abstract
The early layers of a deep neural net have the fewest parameters, but take up the most computation. In this extended abstract, we propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass. Through experiments on CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20% wall-clock time during training with 3% loss in accuracy for DenseNets, a 20% speedup without loss of accuracy for ResNets, and no improvement for VGG networks. Our code is publicly available at https://github.com/ajbrock/FreezeOut
