Table of Contents
Fetching ...

ExpTest: Automating Learning Rate Searching and Tuning with Insights from Linearized Neural Networks

Zan Chaudhry, Naoko Mizuno

TL;DR

This work presents ExpTest, a sophisticated method for initial learning rate searching and subsequent learning rate tuning for the training of DNNs that requires minimal overhead, is robust to hyperparameter choice, and achieves state-of-the-art performance on a variety of tasks and architectures without initial learning rate selection or learning rate scheduling.

Abstract

Hyperparameter tuning remains a significant challenge for the training of deep neural networks (DNNs), requiring manual and/or time-intensive grid searches, increasing resource costs and presenting a barrier to the democratization of machine learning. The global initial learning rate for DNN training is particularly important. Several techniques have been proposed for automated learning rate tuning during training; however, they still require manual searching for the global initial learning rate. Though methods exist that do not require this initial selection, they suffer from poor performance. Here, we present ExpTest, a sophisticated method for initial learning rate searching and subsequent learning rate tuning for the training of DNNs. ExpTest draws on insights from linearized neural networks and the form of the loss curve, which we treat as a real-time signal upon which we perform hypothesis testing. We mathematically justify ExpTest and provide empirical support. ExpTest requires minimal overhead, is robust to hyperparameter choice, and achieves state-of-the-art performance on a variety of tasks and architectures, without initial learning rate selection or learning rate scheduling.

ExpTest: Automating Learning Rate Searching and Tuning with Insights from Linearized Neural Networks

TL;DR

This work presents ExpTest, a sophisticated method for initial learning rate searching and subsequent learning rate tuning for the training of DNNs that requires minimal overhead, is robust to hyperparameter choice, and achieves state-of-the-art performance on a variety of tasks and architectures without initial learning rate selection or learning rate scheduling.

Abstract

Hyperparameter tuning remains a significant challenge for the training of deep neural networks (DNNs), requiring manual and/or time-intensive grid searches, increasing resource costs and presenting a barrier to the democratization of machine learning. The global initial learning rate for DNN training is particularly important. Several techniques have been proposed for automated learning rate tuning during training; however, they still require manual searching for the global initial learning rate. Though methods exist that do not require this initial selection, they suffer from poor performance. Here, we present ExpTest, a sophisticated method for initial learning rate searching and subsequent learning rate tuning for the training of DNNs. ExpTest draws on insights from linearized neural networks and the form of the loss curve, which we treat as a real-time signal upon which we perform hypothesis testing. We mathematically justify ExpTest and provide empirical support. ExpTest requires minimal overhead, is robust to hyperparameter choice, and achieves state-of-the-art performance on a variety of tasks and architectures, without initial learning rate selection or learning rate scheduling.

Paper Structure

This paper contains 21 sections, 79 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Training loss curves for logistic regression on MNIST, moving averaged over window of 1 epoch. Center line shows mean of 5 trials with standard error shown by shading. Optimizers with high loss excluded from legend for clarity.
  • Figure 2: Training loss curves on MNIST logistic regression for ExpTest ($\alpha = 0.05$, $\beta = 0.33$) at three different mini-batch sizes. Lines represent mean of 5 trials with standard error shown by shading.
  • Figure 3: Training loss curves for regression with fully connected network on California Housing Dataset. Center line shows mean of 5 trials with standard error shown by shading. $y$-axis is limited at 0.5 for clarity.
  • Figure 4: Training loss curves for modified VGG-16 on CIFAR-10, moving averaged over window of 1 epoch. Center line shows mean of 5 trials with standard error shown by shading. Optimizers with high loss excluded from legend for clarity.
  • Figure 5: Test MSE-Losses for model width-depth combinations in the lattice: $w=[32, 512, 1024]$, $d=[2,8,32]$ over the displayed learning rate range. Shapes correspond to width and colors to depth. Values are the mean of 5 trials.
  • ...and 1 more figures