Exploring the loss landscape of regularized neural networks via convex duality
Sungyoon Kim, Aaron Mishkin, Mert Pilanci
TL;DR
The paper develops a duality-based framework to analyze the loss landscape of regularized ReLU networks by castings the training problem into a convex cone form and studying its dual. It shows that for wide enough two-layer networks the problem is equivalent to a cone-constrained group LASSO, with the dual optimum \\nu^* determining fixed optimal directions and yielding a polyhedral description of the optimal set. A staircase of connectivity is established as the network width crosses critical thresholds, and nonunique minimum-norm interpolators are constructed, highlighting the role of regularization and architectural choices. The approach generalizes to vector-valued and parallel deep architectures, preserving finite sets of weight directions and extending connectivity results; the findings illuminate how regularization shapes the loss landscape and offer tools for understanding optimization dynamics in practice.
Abstract
We discuss several aspects of the loss landscape of regularized neural networks: the structure of stationary points, connectivity of optimal solutions, path with nonincreasing loss to arbitrary global optimum, and the nonuniqueness of optimal solutions, by casting the problem into an equivalent convex problem and considering its dual. Starting from two-layer neural networks with scalar output, we first characterize the solution set of the convex problem using its dual and further characterize all stationary points. With the characterization, we show that the topology of the global optima goes through a phase transition as the width of the network changes, and construct counterexamples where the problem may have a continuum of optimal solutions. Finally, we show that the solution set characterization and connectivity results can be extended to different architectures, including two-layer vector-valued neural networks and parallel three-layer neural networks.
