Porcupine Neural Networks: (Almost) All Local Optima are Global
Soheil Feizi, Hamid Javadi, Jesse Zhang, David Tse
TL;DR
This work introduces Porcupine Neural Networks (PNNs), a line-constrained two-layer network, to illuminate why gradient-based methods often succeed in training larger unconstrained networks. It shows that, under Gaussian inputs and ReLU activations, most local optima of matched PNNs are global and identifies precise regions with potential bad local optima; it further derives how the PNN landscape changes under mismatches and how well unconstrained networks can be approximated by PNNs with polynomially many lines and neurons. The authors develop a kernel-based framework (involving ψ[K_L]) to characterize global optima and derive spectral-norm bounds via Schur complements for mismatched and general PNNs, complemented by asymptotic analyses. They validate theory with experiments demonstrating improved approximation and a higher likelihood of global optima as model complexity grows, offering a potential explanation for the practical success of local search in deep learning. Overall, PNNs provide a tractable bridge between constrained analysis and unconstrained neural networks with implications for understanding optimization landscapes and model approximation.
Abstract
Neural networks have been used prominently in several machine learning and statistics applications. In general, the underlying optimization of neural networks is non-convex which makes their performance analysis challenging. In this paper, we take a novel approach to this problem by asking whether one can constrain neural network weights to make its optimization landscape have good theoretical properties while at the same time, be a good approximation for the unconstrained one. For two-layer neural networks, we provide affirmative answers to these questions by introducing Porcupine Neural Networks (PNNs) whose weight vectors are constrained to lie over a finite set of lines. We show that most local optima of PNN optimizations are global while we have a characterization of regions where bad local optimizers may exist. Moreover, our theoretical and empirical results suggest that an unconstrained neural network can be approximated using a polynomially-large PNN.
