Table of Contents
Fetching ...

Revisiting "Qualitatively Characterizing Neural Network Optimization Problems"

Jonathan Frankle

TL;DR

The paper revisits the observation that the loss along the line segment from initialization to the final trained weights is approximately convex, testing this on modern networks and datasets. Using four image-classification settings and 100 interpolation points parameterized by $x \in [0,1]$, it examines the loss and test error along the path. Contrary to the MNIST-era finding, they observe that in large-scale settings the training loss remains near the initialization for much of the path and only falls near the optimum, with linearly accessible barriers appearing when interpolating from mid-training. These results imply that the simple convex-path picture does not generalize to modern architectures, though linear interpolation still serves as a qualitative diagnostic for optimization dynamics on current tasks.

Abstract

We revisit and extend the experiments of Goodfellow et al. (2014), who showed that - for then state-of-the-art networks - "the objective function has a simple, approximately convex shape" along the linear path between initialization and the trained weights. We do not find this to be the case for modern networks on CIFAR-10 and ImageNet. Instead, although loss is roughly monotonically non-increasing along this path, it remains high until close to the optimum. In addition, training quickly becomes linearly separated from the optimum by loss barriers. We conclude that, although Goodfellow et al.'s findings describe the "relatively easy to optimize" MNIST setting, behavior is qualitatively different in modern settings.

Revisiting "Qualitatively Characterizing Neural Network Optimization Problems"

TL;DR

The paper revisits the observation that the loss along the line segment from initialization to the final trained weights is approximately convex, testing this on modern networks and datasets. Using four image-classification settings and 100 interpolation points parameterized by , it examines the loss and test error along the path. Contrary to the MNIST-era finding, they observe that in large-scale settings the training loss remains near the initialization for much of the path and only falls near the optimum, with linearly accessible barriers appearing when interpolating from mid-training. These results imply that the simple convex-path picture does not generalize to modern architectures, though linear interpolation still serves as a qualitative diagnostic for optimization dynamics on current tasks.

Abstract

We revisit and extend the experiments of Goodfellow et al. (2014), who showed that - for then state-of-the-art networks - "the objective function has a simple, approximately convex shape" along the linear path between initialization and the trained weights. We do not find this to be the case for modern networks on CIFAR-10 and ImageNet. Instead, although loss is roughly monotonically non-increasing along this path, it remains high until close to the optimum. In addition, training quickly becomes linearly separated from the optimum by loss barriers. We conclude that, although Goodfellow et al.'s findings describe the "relatively easy to optimize" MNIST setting, behavior is qualitatively different in modern settings.

Paper Structure

This paper contains 5 sections, 3 figures.

Figures (3)

  • Figure 1: The test loss (left) and error (right) when linearly interpolating at 100 points from the state of the network before training ($x=0$) to the state of the network after training ($x=1.0$). Comparable MNIST results are in Figure 4a of goodfellow2014explaining.
  • Figure 2: The test loss (left) and error (right) when linearly interpolating at 100 points from the state of the network at the specified iteration ($x=0$) to the state of the network after training ($x=1$).
  • Figure 3: The maximum increase in loss (top) and error (bottom) along the linear path between the state of the network at iteration $t$ (on the x-axis; log scale) and the state of the network after training.