Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks
Vivak Patel, Christian Varner
TL;DR
Non-convex optimization in deep learning motivates exploration of smoothness beyond global Lipschitz continuity. The paper introduces and orders generalized conditions—$\rho$-order Lipschitz continuity and $\rho$-integrated Lipschitz continuity—relating them to differentiability and Hessian behavior, and then tests their applicability on deep linear networks for binary classification. The main finding is that these generalized conditions do not hold for the gradient of deep linear networks of any depth; the gradient is instead locally Lipschitz, which remains the only reliable smoothness property in this setting. The work cautions practitioners to verify the relevant smoothness class for a given function family before relying on convergence analyses based on these generalized conditions, rather than assuming their validity a priori.
Abstract
The presence of non-convexity in smooth optimization problems arising from deep learning have sparked new smoothness conditions in the literature and corresponding convergence analyses. We discuss these smoothness conditions, order them, provide conditions for determining whether they hold, and evaluate their applicability to training a deep linear neural network for binary classification.
