Some Fundamental Aspects about Lipschitz Continuity of Neural Networks
Grigory Khromov, Sidak Pal Singh
TL;DR
The paper investigates the inherent Lipschitz behavior of neural networks beyond traditional global bounds by empirically bounding the true Lipschitz constant with a data-grounded lower bound $C_{mathcal{D}^+}$ and a simple upper bound $C_{ ext{upper}}$. Across architectures from FCNs to ResNet-50 and Vision Transformers on datasets including MNIST and ImageNet, the study shows that the local lower bound tracks the effective Lipschitz more faithfully than the upper bound, while training drives increases in both bounds; a Lipschitz Double Descent is observed with width and label-noise interactions. The work highlights an implicit regularisation effect in over-parameterised networks, whose Lipschitz behaviour interacts nontrivially with label noise, and suggests that effective Lipschitz measures may be more informative for generalisation and robustness analyses than naive upper-bound estimates. Overall, the findings provide a scalable framework and empirical scaffolding to guide theoretical development of Lipschitz-based generalisation and robustness in large neural networks.
Abstract
Lipschitz continuity is a crucial functional property of any predictive model, that naturally governs its robustness, generalisation, as well as adversarial vulnerability. Contrary to other works that focus on obtaining tighter bounds and developing different practical strategies to enforce certain Lipschitz properties, we aim to thoroughly examine and characterise the Lipschitz behaviour of Neural Networks. Thus, we carry out an empirical investigation in a range of different settings (namely, architectures, datasets, label noise, and more) by exhausting the limits of the simplest and the most general lower and upper bounds. As a highlight of this investigation, we showcase a remarkable fidelity of the lower Lipschitz bound, identify a striking Double Descent trend in both upper and lower bounds to the Lipschitz and explain the intriguing effects of label noise on function smoothness and generalisation.
