Table of Contents
Fetching ...

Depth Creates No Bad Local Minima

Haihao Lu, Kenji Kawaguchi

TL;DR

This work investigates whether depth, in the absence of nonlinearity, can introduce bad local minima in deep linear networks. It shows that while depth yields a non-convex loss surface, it does not create new bad local minima by relating the deep parameterization to a rank-constrained shallow model and proving equivalence of local minima under full-row-rank conditions. The main contributions are a pair of results: (i) every local minimum of the deep problem maps to a local minimum of a shallow, rank-constrained problem, and (ii) if the input is sufficiently informative, all local minima of the shallow problem are global, implying all local minima of the deep network are global. The approach provides a simpler, more general framework that extends beyond Frobenius loss and has potential implications for matrix completion and broader loss functions.

Abstract

In deep learning, \textit{depth}, as well as \textit{nonlinearity}, create non-convex loss surfaces. Then, does depth alone create bad local minima? In this paper, we prove that without nonlinearity, depth alone does not create bad local minima, although it induces non-convex loss surface. Using this insight, we greatly simplify a recently proposed proof to show that all of the local minima of feedforward deep linear neural networks are global minima. Our theoretical results generalize previous results with fewer assumptions, and this analysis provides a method to show similar results beyond square loss in deep linear models.

Depth Creates No Bad Local Minima

TL;DR

This work investigates whether depth, in the absence of nonlinearity, can introduce bad local minima in deep linear networks. It shows that while depth yields a non-convex loss surface, it does not create new bad local minima by relating the deep parameterization to a rank-constrained shallow model and proving equivalence of local minima under full-row-rank conditions. The main contributions are a pair of results: (i) every local minimum of the deep problem maps to a local minimum of a shallow, rank-constrained problem, and (ii) if the input is sufficiently informative, all local minima of the shallow problem are global, implying all local minima of the deep network are global. The approach provides a simpler, more general framework that extends beyond Frobenius loss and has potential implications for matrix completion and broader loss functions.

Abstract

In deep learning, \textit{depth}, as well as \textit{nonlinearity}, create non-convex loss surfaces. Then, does depth alone create bad local minima? In this paper, we prove that without nonlinearity, depth alone does not create bad local minima, although it induces non-convex loss surface. Using this insight, we greatly simplify a recently proposed proof to show that all of the local minima of feedforward deep linear neural networks are global minima. Our theoretical results generalize previous results with fewer assumptions, and this analysis provides a method to show similar results beyond square loss in deep linear models.

Paper Structure

This paper contains 7 sections, 10 theorems, 26 equations.

Key Result

Theorem 2.1

(Depth creates no new bad local minima) Assume that $X$ and $Y$ have full row rank. If $\bar{W}=\{\bar{W}_{1},\dots,\bar{W}_{H}\}$ is a local minimum of problem (eq:obj), then $\bar{R} = \bar{W}_{H} \bar{W}_{H-1}\cdots \bar{W}_{1}$ achieves the value of a local minimum of problem (eq:newprob).

Theorems & Definitions (10)

  • Theorem 2.1
  • Theorem 2.2
  • Theorem 2.3
  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Theorem 3.1
  • Theorem 3.2
  • Lemma 3.4
  • Theorem 3.3