Table of Contents
Fetching ...

Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks

Vivak Patel, Christian Varner

TL;DR

Non-convex optimization in deep learning motivates exploration of smoothness beyond global Lipschitz continuity. The paper introduces and orders generalized conditions—$\rho$-order Lipschitz continuity and $\rho$-integrated Lipschitz continuity—relating them to differentiability and Hessian behavior, and then tests their applicability on deep linear networks for binary classification. The main finding is that these generalized conditions do not hold for the gradient of deep linear networks of any depth; the gradient is instead locally Lipschitz, which remains the only reliable smoothness property in this setting. The work cautions practitioners to verify the relevant smoothness class for a given function family before relying on convergence analyses based on these generalized conditions, rather than assuming their validity a priori.

Abstract

The presence of non-convexity in smooth optimization problems arising from deep learning have sparked new smoothness conditions in the literature and corresponding convergence analyses. We discuss these smoothness conditions, order them, provide conditions for determining whether they hold, and evaluate their applicability to training a deep linear neural network for binary classification.

Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks

TL;DR

Non-convex optimization in deep learning motivates exploration of smoothness beyond global Lipschitz continuity. The paper introduces and orders generalized conditions—-order Lipschitz continuity and -integrated Lipschitz continuity—relating them to differentiability and Hessian behavior, and then tests their applicability on deep linear networks for binary classification. The main finding is that these generalized conditions do not hold for the gradient of deep linear networks of any depth; the gradient is instead locally Lipschitz, which remains the only reliable smoothness property in this setting. The work cautions practitioners to verify the relevant smoothness class for a given function family before relying on convergence analyses based on these generalized conditions, rather than assuming their validity a priori.

Abstract

The presence of non-convexity in smooth optimization problems arising from deep learning have sparked new smoothness conditions in the literature and corresponding convergence analyses. We discuss these smoothness conditions, order them, provide conditions for determining whether they hold, and evaluate their applicability to training a deep linear neural network for binary classification.
Paper Structure (8 sections, 9 theorems, 35 equations, 2 figures, 1 table)

This paper contains 8 sections, 9 theorems, 35 equations, 2 figures, 1 table.

Key Result

Proposition 1

Let $(\mathcal{X},\Vert \cdot \Vert_\mathcal{X}), (\mathcal{Y}, \Vert \cdot \Vert_{\mathcal{Y}})$ be normed vector spaces over $\mathbb{R}$ or $\mathbb{C}$. Let $D: \mathcal{X} \to \mathcal{Y}$ be continuous and let $\rho \geq 0$. $D$ is $\rho$-order Lipschitz continuous if and only if it is $\rho$-

Figures (2)

  • Figure 1: A diagram of a simple feed forward network with three hidden layers and an output layer for binary classification.
  • Figure 2: A diagram of an arbitrary-depth, multi-dimensional feed forward network for binary classification.

Theorems & Definitions (25)

  • Definition 1: Globally Lipschitz Continuous
  • Remark 1
  • Definition 2: Level-set Lipschitz Continuous
  • Definition 3: Locally Lipschitz Continuous
  • Definition 4: $\rho$-Order Lipschitz Continuous
  • Definition 5: $\rho$-Integrated Lipschitz Continuous
  • Proposition 1
  • proof : Proof of sufficiency of Proposition \ref{['result-order-integrated-equivalent']}
  • Lemma 2
  • proof
  • ...and 15 more