Table of Contents
Fetching ...

Porcupine Neural Networks: (Almost) All Local Optima are Global

Soheil Feizi, Hamid Javadi, Jesse Zhang, David Tse

TL;DR

This work introduces Porcupine Neural Networks (PNNs), a line-constrained two-layer network, to illuminate why gradient-based methods often succeed in training larger unconstrained networks. It shows that, under Gaussian inputs and ReLU activations, most local optima of matched PNNs are global and identifies precise regions with potential bad local optima; it further derives how the PNN landscape changes under mismatches and how well unconstrained networks can be approximated by PNNs with polynomially many lines and neurons. The authors develop a kernel-based framework (involving ψ[K_L]) to characterize global optima and derive spectral-norm bounds via Schur complements for mismatched and general PNNs, complemented by asymptotic analyses. They validate theory with experiments demonstrating improved approximation and a higher likelihood of global optima as model complexity grows, offering a potential explanation for the practical success of local search in deep learning. Overall, PNNs provide a tractable bridge between constrained analysis and unconstrained neural networks with implications for understanding optimization landscapes and model approximation.

Abstract

Neural networks have been used prominently in several machine learning and statistics applications. In general, the underlying optimization of neural networks is non-convex which makes their performance analysis challenging. In this paper, we take a novel approach to this problem by asking whether one can constrain neural network weights to make its optimization landscape have good theoretical properties while at the same time, be a good approximation for the unconstrained one. For two-layer neural networks, we provide affirmative answers to these questions by introducing Porcupine Neural Networks (PNNs) whose weight vectors are constrained to lie over a finite set of lines. We show that most local optima of PNN optimizations are global while we have a characterization of regions where bad local optimizers may exist. Moreover, our theoretical and empirical results suggest that an unconstrained neural network can be approximated using a polynomially-large PNN.

Porcupine Neural Networks: (Almost) All Local Optima are Global

TL;DR

This work introduces Porcupine Neural Networks (PNNs), a line-constrained two-layer network, to illuminate why gradient-based methods often succeed in training larger unconstrained networks. It shows that, under Gaussian inputs and ReLU activations, most local optima of matched PNNs are global and identifies precise regions with potential bad local optima; it further derives how the PNN landscape changes under mismatches and how well unconstrained networks can be approximated by PNNs with polynomially many lines and neurons. The authors develop a kernel-based framework (involving ψ[K_L]) to characterize global optima and derive spectral-norm bounds via Schur complements for mismatched and general PNNs, complemented by asymptotic analyses. They validate theory with experiments demonstrating improved approximation and a higher likelihood of global optima as model complexity grows, offering a potential explanation for the practical success of local search in deep learning. Overall, PNNs provide a tractable bridge between constrained analysis and unconstrained neural networks with implications for understanding optimization landscapes and model approximation.

Abstract

Neural networks have been used prominently in several machine learning and statistics applications. In general, the underlying optimization of neural networks is non-convex which makes their performance analysis challenging. In this paper, we take a novel approach to this problem by asking whether one can constrain neural network weights to make its optimization landscape have good theoretical properties while at the same time, be a good approximation for the unconstrained one. For two-layer neural networks, we provide affirmative answers to these questions by introducing Porcupine Neural Networks (PNNs) whose weight vectors are constrained to lie over a finite set of lines. We show that most local optima of PNN optimizations are global while we have a characterization of regions where bad local optimizers may exist. Moreover, our theoretical and empirical results suggest that an unconstrained neural network can be approximated using a polynomially-large PNN.

Paper Structure

This paper contains 39 sections, 31 theorems, 168 equations, 11 figures.

Key Result

Theorem 1

The loss function eq:loss-function for a scalar PNN can be written as

Figures (11)

  • Figure 1: (a) A two-layer Porcupine Neural Network (PNN). (b) In PNN, an incoming weight vector to a neuron is constrained to lie over a line in a $d$-dimensional space.
  • Figure 2: Approximations of an unconstrained two-layer neural network with $d=15$ inputs and $k^*=20$ hidden neurons using random two-layer PNNs.
  • Figure 3: Examples of (a) scalar PNN, and (b) degree-one PNN structures.
  • Figure 4: For the scalar PNN, parameter regions where $s(\mathbf{W})=\pm \mathbf{1}$ may include bad local optima. In other regions, all local optima are global. This figure highlights regions where $s(\mathbf{W})=\pm \mathbf{1}$ for a scalar PNN with two neurons.
  • Figure 5: The landscape of the loss function for a scalar PNN with two neurons. In panel (a), we consider $w_1^*=6$ and $w_2^*=4$, while in panel (b), we have $w_1^*=6$ and $w_2^*=-4$. According to Theorem \ref{['thm:local']}, in the case of panel (a), the loss function does not have bad local optimizers, while in the case of panel (b), it has bad local optimizers in regions $R\left((-1,-1)\right)$ and $R\left((1,1)\right)$.
  • ...and 6 more figures

Theorems & Definitions (35)

  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Corollary 2
  • Theorem 5
  • Theorem 6
  • Lemma 1
  • Corollary 3
  • ...and 25 more