Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks

Vivak Patel; Christian Varner

Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks

Vivak Patel, Christian Varner

TL;DR

Non-convex optimization in deep learning motivates exploration of smoothness beyond global Lipschitz continuity. The paper introduces and orders generalized conditions—$\rho$-order Lipschitz continuity and $\rho$-integrated Lipschitz continuity—relating them to differentiability and Hessian behavior, and then tests their applicability on deep linear networks for binary classification. The main finding is that these generalized conditions do not hold for the gradient of deep linear networks of any depth; the gradient is instead locally Lipschitz, which remains the only reliable smoothness property in this setting. The work cautions practitioners to verify the relevant smoothness class for a given function family before relying on convergence analyses based on these generalized conditions, rather than assuming their validity a priori.

Abstract

The presence of non-convexity in smooth optimization problems arising from deep learning have sparked new smoothness conditions in the literature and corresponding convergence analyses. We discuss these smoothness conditions, order them, provide conditions for determining whether they hold, and evaluate their applicability to training a deep linear neural network for binary classification.

Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks

TL;DR

Non-convex optimization in deep learning motivates exploration of smoothness beyond global Lipschitz continuity. The paper introduces and orders generalized conditions—

-order Lipschitz continuity and

-integrated Lipschitz continuity—relating them to differentiability and Hessian behavior, and then tests their applicability on deep linear networks for binary classification. The main finding is that these generalized conditions do not hold for the gradient of deep linear networks of any depth; the gradient is instead locally Lipschitz, which remains the only reliable smoothness property in this setting. The work cautions practitioners to verify the relevant smoothness class for a given function family before relying on convergence analyses based on these generalized conditions, rather than assuming their validity a priori.

Abstract

Paper Structure (8 sections, 9 theorems, 35 equations, 2 figures, 1 table)

This paper contains 8 sections, 9 theorems, 35 equations, 2 figures, 1 table.

Introduction
Smoothness Conditions
Ordering
Continuity Conditions and Differentiability
Applicability to Deep Linear Neural Networks
A Three Hidden Layer Neural Network
An Arbitrary-Depth Neural Network
Conclusion

Key Result

Proposition 1

Let $(\mathcal{X},\Vert \cdot \Vert_\mathcal{X}), (\mathcal{Y}, \Vert \cdot \Vert_{\mathcal{Y}})$ be normed vector spaces over $\mathbb{R}$ or $\mathbb{C}$. Let $D: \mathcal{X} \to \mathcal{Y}$ be continuous and let $\rho \geq 0$. $D$ is $\rho$-order Lipschitz continuous if and only if it is $\rho$-

Figures (2)

Figure 1: A diagram of a simple feed forward network with three hidden layers and an output layer for binary classification.
Figure 2: A diagram of an arbitrary-depth, multi-dimensional feed forward network for binary classification.

Theorems & Definitions (25)

Definition 1: Globally Lipschitz Continuous
Remark 1
Definition 2: Level-set Lipschitz Continuous
Definition 3: Locally Lipschitz Continuous
Definition 4: $\rho$-Order Lipschitz Continuous
Definition 5: $\rho$-Integrated Lipschitz Continuous
Proposition 1
proof : Proof of sufficiency of Proposition \ref{['result-order-integrated-equivalent']}
Lemma 2
proof
...and 15 more

Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks

TL;DR

Abstract

Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (25)