Optimization-Induced Dynamics of Lipschitz Continuity in Neural Networks

Róisín Luo; James McDermott; Christian Gagné; Qiang Sun; Colm O'Riordan

Optimization-Induced Dynamics of Lipschitz Continuity in Neural Networks

Róisín Luo, James McDermott, Christian Gagné, Qiang Sun, Colm O'Riordan

TL;DR

This work develops a rigorous stochastic-differential-equation framework to model the temporal evolution of neural network Lipschitz continuity during SGD training. It decomposes the dynamics into layer- and network-level drift and diffusion terms driven by gradient flow projections, mini-batch gradient noise, and noise-curvature effects, with a detailed operator-norm perturbation analysis. A practical, low-rank gradient-noise estimator enables scalable computation of these dynamics on modern architectures, and the theory is validated on CIFAR-10/100 across multiple regularizers. The results reveal how initialization, batch size, label noise, and sampling trajectories shape Lipschitz growth, including near-convergence unbounded growth and noise-regularization effects, offering insights for robust, trustworthy deep learning systems.

Abstract

Lipschitz continuity characterizes the worst-case sensitivity of neural networks to small input perturbations; yet its dynamics (i.e. temporal evolution) during training remains under-explored. We present a rigorous mathematical framework to model the temporal evolution of Lipschitz continuity during training with stochastic gradient descent (SGD). This framework leverages a system of stochastic differential equations (SDEs) to capture both deterministic and stochastic forces. Our theoretical analysis identifies three principal factors driving the evolution: (i) the projection of gradient flows, induced by the optimization dynamics, onto the operator-norm Jacobian of parameter matrices; (ii) the projection of gradient noise, arising from the randomness in mini-batch sampling, onto the operator-norm Jacobian; and (iii) the projection of the gradient noise onto the operator-norm Hessian of parameter matrices. Furthermore, our theoretical framework sheds light on such as how noisy supervision, parameter initialization, batch size, and mini-batch sampling trajectories, among other factors, shape the evolution of the Lipschitz continuity of neural networks. Our experimental results demonstrate strong agreement between the theoretical implications and the observed behaviors.

Optimization-Induced Dynamics of Lipschitz Continuity in Neural Networks

TL;DR

Abstract

Optimization-Induced Dynamics of Lipschitz Continuity in Neural Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (20)