Optimal Sets and Solution Paths of ReLU Networks
Aaron Mishkin, Mert Pilanci
TL;DR
The paper addresses the challenge of understanding global optima and solution paths for shallow ReLU networks by recasting training as a convex program in lifted parameters, revealing that the global optima form a polyhedral set. Through a constrained group Lasso lens, it derives an explicit description of the optimal set, computes dual parameters, and introduces an optimal pruning algorithm to obtain minimal networks. It further analyzes the regularization path, establishes continuity under key conditions, and provides min-norm path computation and sensitivity results, offering a principled view of regularization and stability in ReLU models. Empirical results on UCI benchmarks, MNIST, and CIFAR-10 demonstrate substantial variation among optimal models and showcase the practical efficacy of the proposed pruning approach and theory-grounded tuning.
Abstract
We develop an analytical framework to characterize the set of optimal ReLU neural networks by reformulating the non-convex training problem as a convex program. We show that the global optima of the convex parameterization are given by a polyhedral set and then extend this characterization to the optimal set of the non-convex training objective. Since all stationary points of the ReLU training problem can be represented as optima of sub-sampled convex programs, our work provides a general expression for all critical points of the non-convex objective. We then leverage our results to provide an optimal pruning algorithm for computing minimal networks, establish conditions for the regularization path of ReLU networks to be continuous, and develop sensitivity results for minimal ReLU networks.
