Optimal Sets and Solution Paths of ReLU Networks

Aaron Mishkin; Mert Pilanci

Optimal Sets and Solution Paths of ReLU Networks

Aaron Mishkin, Mert Pilanci

TL;DR

The paper addresses the challenge of understanding global optima and solution paths for shallow ReLU networks by recasting training as a convex program in lifted parameters, revealing that the global optima form a polyhedral set. Through a constrained group Lasso lens, it derives an explicit description of the optimal set, computes dual parameters, and introduces an optimal pruning algorithm to obtain minimal networks. It further analyzes the regularization path, establishes continuity under key conditions, and provides min-norm path computation and sensitivity results, offering a principled view of regularization and stability in ReLU models. Empirical results on UCI benchmarks, MNIST, and CIFAR-10 demonstrate substantial variation among optimal models and showcase the practical efficacy of the proposed pruning approach and theory-grounded tuning.

Abstract

We develop an analytical framework to characterize the set of optimal ReLU neural networks by reformulating the non-convex training problem as a convex program. We show that the global optima of the convex parameterization are given by a polyhedral set and then extend this characterization to the optimal set of the non-convex training objective. Since all stationary points of the ReLU training problem can be represented as optima of sub-sampled convex programs, our work provides a general expression for all critical points of the non-convex objective. We then leverage our results to provide an optimal pruning algorithm for computing minimal networks, establish conditions for the regularization path of ReLU networks to be continuous, and develop sensitivity results for minimal ReLU networks.

Optimal Sets and Solution Paths of ReLU Networks

TL;DR

Abstract

Paper Structure (29 sections, 63 theorems, 205 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 29 sections, 63 theorems, 205 equations, 7 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Notation
Convex Reformulations
Gated ReLU Networks
The Constrained Group Lasso
Describing the Optimal Set
Computing Dual Optimal Parameters
Minimal Solutions and Optimal Pruning
Continuity of the Solution Path
The Min-Norm Path
Sensitivity
Specialization to Neural Networks
Experiments
Conclusion
...and 14 more sections

Key Result

Proposition 3.0

Fix $\lambda > 0$. The optimal set for the CGL problem is given by

Figures (7)

Figure 1: Convex vs non-convex solution spaces for two-layer ReLU networks. We plot the first feature of three different neurons; the non-convex parameterization maps the compact polytope of solutions for the convex parameterization into a curved manifold.
Figure 2: Pruning neurons from two-layer ReLU networks on binary classification tasks from the UCI repository. We compare our theory-inspired approach (Optimal/LS), against removing the neuron with smallest $\ell_2$ norm (Neuron Magnitude), removing the neuron with the smallest weighted gradient norm (Gradient Magnitude), and random pruning (Random). For Optimal/LS, we use \ref{['alg:pruning-solutions-nn']}, which begins with optimal pruning and then switches to a least-squares heuristic. We plot test accuracy against number of active neurons. Optimal/LS dominates the baseline methods on every dataset and even improves test accuracy on breast-cancer and fertility.
Figure 3: Pruning neurons from two-layer ReLU networks on two binary classification tasks drawn from the CIFAR-10 dataset. We compare our method (Optimal/LS) against baselines; see \ref{['fig:uci-pruning-acc']} for details. Our approach, which makes use of a weight correction after pruning, outperforms every baseline.
Figure 4: Pruning neurons on five datasets from the UCI repository. This figure extends \ref{['fig:uci-pruning-acc']} with training accuracy in addition to the test accuracies shown in the main paper.
Figure 5: Pruning neurons on five additional datasets from the UCI repository. See \ref{['fig:uci-pruning-acc']} for details. Our method (Optimal/LS) preservers test accuracy for longer than the baseline methods, leading to compact models with better generalization.
...and 2 more figures

Theorems & Definitions (105)

Proposition 3.0
Corollary 3.1
Lemma 3.1
Corollary 3.2
Proposition 3.2
Proposition 3.2
Corollary 3.3
Proposition 3.3
Corollary 3.4
Definition 3.5: Closed
...and 95 more

Optimal Sets and Solution Paths of ReLU Networks

TL;DR

Abstract

Optimal Sets and Solution Paths of ReLU Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (105)