Mildly Overparameterized ReLU Networks Have a Favorable Loss Landscape
Kedar Karhadkar, Michael Murray, Hanna Tseran, Guido Montúfar
TL;DR
This paper analyzes the loss landscapes of mildly overparameterized ReLU networks on finite datasets under squared loss, revealing that most activation regions contain no bad local minima and often host high-dimensional global minima. It develops Jacobian-rank based arguments and combinatorial region counting to characterize when differentiable critical points are global optima, and provides explicit results for shallow two-layer networks as well as one-dimensional inputs. The results extend to deep networks under mild width conditions and include volume-based bounds via anticoncentration arguments, with experimental evidence showing phase transitions in the prevalence of full-rank Jacobians. Collectively, the work suggests that realistic levels of overparameterization yield substantially benign optimization landscapes, independent of initialization or data distribution, though open questions remain for intermediate widths and deeper architectures.
Abstract
We study the loss landscape of both shallow and deep, mildly overparameterized ReLU neural networks on a generic finite input dataset for the squared error loss. We show both by count and volume that most activation patterns correspond to parameter regions with no bad local minima. Furthermore, for one-dimensional input data, we show most activation regions realizable by the network contain a high dimensional set of global minima and no bad local minima. We experimentally confirm these results by finding a phase transition from most regions having full rank Jacobian to many regions having deficient rank depending on the amount of overparameterization.
