Loss Patterns of Neural Networks

Ivan Skorokhodov; Mikhail Burtsev

Loss Patterns of Neural Networks

Ivan Skorokhodov, Mikhail Burtsev

TL;DR

Loss Patterns of Neural Networks introduces multi-point optimization (MPO) to study neural network loss landscapes by optimizing a two-dimensional manifold containing $K$ weight vectors in $\mathbb{R}^n$. MPO minimizes and maximizes cross-entropy across a pattern of black/white pixels to shape the loss surface, enabling simultaneous training of many parameterizations with substantial memory savings. Empirical results on FashionMNIST and CIFAR10 show that the loss surface is highly diverse and can realize arbitrary 2D patterns, while batch normalization smooths the landscape and generally improves mean accuracy. The work provides a new lens on loss landscape analysis, with potential applications to decorrelated ensembles and theoretical understanding, and accompanies public code on GitHub.

Abstract

We present multi-point optimization: an optimization technique that allows to train several models simultaneously without the need to keep the parameters of each one individually. The proposed method is used for a thorough empirical analysis of the loss landscape of neural networks. By extensive experiments on FashionMNIST and CIFAR10 datasets we demonstrate two things: 1) loss surface is surprisingly diverse and intricate in terms of landscape patterns it contains, and 2) adding batch normalization makes it more smooth. Source code to reproduce all the reported results is available on GitHub: https://github.com/universome/loss-patterns.

Loss Patterns of Neural Networks

TL;DR

Loss Patterns of Neural Networks introduces multi-point optimization (MPO) to study neural network loss landscapes by optimizing a two-dimensional manifold containing

weight vectors in

. MPO minimizes and maximizes cross-entropy across a pattern of black/white pixels to shape the loss surface, enabling simultaneous training of many parameterizations with substantial memory savings. Empirical results on FashionMNIST and CIFAR10 show that the loss surface is highly diverse and can realize arbitrary 2D patterns, while batch normalization smooths the landscape and generally improves mean accuracy. The work provides a new lens on loss landscape analysis, with potential applications to decorrelated ensembles and theoretical understanding, and accompanies public code on GitHub.

Loss Patterns of Neural Networks

TL;DR

Abstract

Loss Patterns of Neural Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)