Table of Contents
Fetching ...

Loss Patterns of Neural Networks

Ivan Skorokhodov, Mikhail Burtsev

TL;DR

Loss Patterns of Neural Networks introduces multi-point optimization (MPO) to study neural network loss landscapes by optimizing a two-dimensional manifold containing $K$ weight vectors in $\mathbb{R}^n$. MPO minimizes and maximizes cross-entropy across a pattern of black/white pixels to shape the loss surface, enabling simultaneous training of many parameterizations with substantial memory savings. Empirical results on FashionMNIST and CIFAR10 show that the loss surface is highly diverse and can realize arbitrary 2D patterns, while batch normalization smooths the landscape and generally improves mean accuracy. The work provides a new lens on loss landscape analysis, with potential applications to decorrelated ensembles and theoretical understanding, and accompanies public code on GitHub.

Abstract

We present multi-point optimization: an optimization technique that allows to train several models simultaneously without the need to keep the parameters of each one individually. The proposed method is used for a thorough empirical analysis of the loss landscape of neural networks. By extensive experiments on FashionMNIST and CIFAR10 datasets we demonstrate two things: 1) loss surface is surprisingly diverse and intricate in terms of landscape patterns it contains, and 2) adding batch normalization makes it more smooth. Source code to reproduce all the reported results is available on GitHub: https://github.com/universome/loss-patterns.

Loss Patterns of Neural Networks

TL;DR

Loss Patterns of Neural Networks introduces multi-point optimization (MPO) to study neural network loss landscapes by optimizing a two-dimensional manifold containing weight vectors in . MPO minimizes and maximizes cross-entropy across a pattern of black/white pixels to shape the loss surface, enabling simultaneous training of many parameterizations with substantial memory savings. Empirical results on FashionMNIST and CIFAR10 show that the loss surface is highly diverse and can realize arbitrary 2D patterns, while batch normalization smooths the landscape and generally improves mean accuracy. The work provides a new lens on loss landscape analysis, with potential applications to decorrelated ensembles and theoretical understanding, and accompanies public code on GitHub.

Abstract

We present multi-point optimization: an optimization technique that allows to train several models simultaneously without the need to keep the parameters of each one individually. The proposed method is used for a thorough empirical analysis of the loss landscape of neural networks. By extensive experiments on FashionMNIST and CIFAR10 datasets we demonstrate two things: 1) loss surface is surprisingly diverse and intricate in terms of landscape patterns it contains, and 2) adding batch normalization makes it more smooth. Source code to reproduce all the reported results is available on GitHub: https://github.com/universome/loss-patterns.

Paper Structure

This paper contains 11 sections, 4 equations, 6 figures.

Figures (6)

  • Figure 1: Examples of a loss landscape of a typical CNN model on FashionMNIST and CIFAR10 datasets found with MPO. Loss values are color-coded according to a logarithmic scale. We used a small VGG-like model which architecture is presented in appendix \ref{['appendix:hyperparams']}; additional visualizations are presented in appendix \ref{['appendix:visualizations']}. Here we used test sets to compute the loss values.
  • Figure 2: Multi-point optimization method for 2D pattern fitting on FashionMNIST dataset.
  • Figure 3: (a) Example of a random binary mask with a filling probability of 0.5. (b) The result of the MPO procedure for the model without batch normalization. (c) The result of the MPO procedure for the model with batch normalization.
  • Figure 4: Additional results for pattern search on FashionMNIST dataset. Since train and test landscapes are almost visually indistinguishable for our model in the case of FashionMNIST dataset, we depict here only test loss surfaces.
  • Figure 5: Additional results for pattern search on CIFAR10 dataset. Left column depicts train loss/accuracy surface, right column depicts test loss/accuracy surface.
  • ...and 1 more figures