Loss Patterns of Neural Networks
Ivan Skorokhodov, Mikhail Burtsev
TL;DR
Loss Patterns of Neural Networks introduces multi-point optimization (MPO) to study neural network loss landscapes by optimizing a two-dimensional manifold containing $K$ weight vectors in $\mathbb{R}^n$. MPO minimizes and maximizes cross-entropy across a pattern of black/white pixels to shape the loss surface, enabling simultaneous training of many parameterizations with substantial memory savings. Empirical results on FashionMNIST and CIFAR10 show that the loss surface is highly diverse and can realize arbitrary 2D patterns, while batch normalization smooths the landscape and generally improves mean accuracy. The work provides a new lens on loss landscape analysis, with potential applications to decorrelated ensembles and theoretical understanding, and accompanies public code on GitHub.
Abstract
We present multi-point optimization: an optimization technique that allows to train several models simultaneously without the need to keep the parameters of each one individually. The proposed method is used for a thorough empirical analysis of the loss landscape of neural networks. By extensive experiments on FashionMNIST and CIFAR10 datasets we demonstrate two things: 1) loss surface is surprisingly diverse and intricate in terms of landscape patterns it contains, and 2) adding batch normalization makes it more smooth. Source code to reproduce all the reported results is available on GitHub: https://github.com/universome/loss-patterns.
