On the loss landscape of a class of deep neural networks with no bad local valleys
Quynh Nguyen, Mahesh Chandra Mukkamala, Matthias Hein
TL;DR
By introducing a class of deep networks with skip-connections to the output and analytic activations, the authors prove the empirical cross-entropy loss \\Phi(U,V) has no bad local valleys and there exist uncountably many zero-training-error solutions. They show that from any initialization there exists a continuous path along which \\Phi is non-increasing and can be driven arbitrarily close to zero, implying no suboptimal strict minima and no local maxima for the considered losses. Empirically, SGD with these skip-output networks generalizes well on MNIST and CIFAR-10, while a random-feature baseline that fixes \\Psi(U) and optimizes only V overfits, illustrating SGD's implicit regularization. Overall, the work provides a practical framework to study implicit regularization in deep nets and positions skip-output architectures as useful benchmarks for loss-landscape analyses.
Abstract
We identify a class of over-parameterized deep neural networks with standard activation functions and cross-entropy loss which provably have no bad local valley, in the sense that from any point in parameter space there exists a continuous path on which the cross-entropy loss is non-increasing and gets arbitrarily close to zero. This implies that these networks have no sub-optimal strict local minima.
