Table of Contents
Fetching ...

On the Stability of Neural Networks in Deep Learning

Blaise Delattre

TL;DR

This work presents a unified framework for stabilizing neural networks through sensitivity analysis by combining Lipschitz constraints, curvature-based regularization, and randomized smoothing. It develops fast, deterministic spectral-norm estimation (Gram iteration) for dense and convolutional layers, introduces Lipschitz-by-design layers (including CPL and SLL), and demonstrates improved training stability and robustness. The thesis connects Weierstrass smoothing with randomized smoothing to derive tighter robustness certificates and introduces mechanisms like activation decay and LVM-RS to reduce variance and improve certification. Together, these methods yield practical, scalable tools for certifiable robustness and generalization, with broad implications for vision, NLP, and large-scale models. The work highlights open problems in scaling Lipschitz networks, handling attention in Lipschitz architectures, and extending certified robustness to large language models.

Abstract

Deep learning has achieved remarkable success across a wide range of tasks, but its models often suffer from instability and vulnerability: small changes to the input may drastically affect predictions, while optimization can be hindered by sharp loss landscapes. This thesis addresses these issues through the unifying perspective of sensitivity analysis, which examines how neural networks respond to perturbations at both the input and parameter levels. We study Lipschitz networks as a principled way to constrain sensitivity to input perturbations, thereby improving generalization, adversarial robustness, and training stability. To complement this architectural approach, we introduce regularization techniques based on the curvature of the loss function, promoting smoother optimization landscapes and reducing sensitivity to parameter variations. Randomized smoothing is also explored as a probabilistic method for enhancing robustness at decision boundaries. By combining these perspectives, we develop a unified framework where Lipschitz continuity, randomized smoothing, and curvature regularization interact to address fundamental challenges in stability. The thesis contributes both theoretical analysis and practical methodologies, including efficient spectral norm computation, novel Lipschitz-constrained layers, and improved certification procedures.

On the Stability of Neural Networks in Deep Learning

TL;DR

This work presents a unified framework for stabilizing neural networks through sensitivity analysis by combining Lipschitz constraints, curvature-based regularization, and randomized smoothing. It develops fast, deterministic spectral-norm estimation (Gram iteration) for dense and convolutional layers, introduces Lipschitz-by-design layers (including CPL and SLL), and demonstrates improved training stability and robustness. The thesis connects Weierstrass smoothing with randomized smoothing to derive tighter robustness certificates and introduces mechanisms like activation decay and LVM-RS to reduce variance and improve certification. Together, these methods yield practical, scalable tools for certifiable robustness and generalization, with broad implications for vision, NLP, and large-scale models. The work highlights open problems in scaling Lipschitz networks, handling attention in Lipschitz architectures, and extending certified robustness to large language models.

Abstract

Deep learning has achieved remarkable success across a wide range of tasks, but its models often suffer from instability and vulnerability: small changes to the input may drastically affect predictions, while optimization can be hindered by sharp loss landscapes. This thesis addresses these issues through the unifying perspective of sensitivity analysis, which examines how neural networks respond to perturbations at both the input and parameter levels. We study Lipschitz networks as a principled way to constrain sensitivity to input perturbations, thereby improving generalization, adversarial robustness, and training stability. To complement this architectural approach, we introduce regularization techniques based on the curvature of the loss function, promoting smoother optimization landscapes and reducing sensitivity to parameter variations. Randomized smoothing is also explored as a probabilistic method for enhancing robustness at decision boundaries. By combining these perspectives, we develop a unified framework where Lipschitz continuity, randomized smoothing, and curvature regularization interact to address fundamental challenges in stability. The thesis contributes both theoretical analysis and practical methodologies, including efficient spectral norm computation, novel Lipschitz-constrained layers, and improved certification procedures.

Paper Structure

This paper contains 161 sections, 37 theorems, 363 equations, 53 figures, 30 tables, 9 algorithms.

Key Result

Lemma 2.1.4

Let $\sigma > 0$, and let $f : \mathbb{R}^d \to \mathbb{R}$ be a function. Then $\tilde{f}$ is differentiable, and its gradient is given by:

Figures (53)

  • Figure 1: Depiction of convex flat and sharp minima in one dimension. Both types of minima have the same relative shift between train and test losses.
  • Figure 2: Example of an adversarial attack on a stop sign. The perturbation is imperceptible to the human eye but causes a misclassification by the network. In this case, the stop sign is misclassified as a yield sign. Figure taken from ahmad2022developing.
  • Figure 3: Venn diagram illustrating the categorization of Lipschitz layers, including scaled layers, orthogonal layers, and residual layers.
  • Figure 4: Figure taken from prach2024lipschitz comparing different Lipschitz layers w.r.t to different criteria. Scores ranged from $1$ (worst) to 5 (best) for every layers.
  • Figure 5: Example of an adversarial attack on an image classifier. The original image (left) is correctly classified as a pig, but the perturbed image (right) is misclassified as an airliner. The perturbation is imperceptible to the human eye but causes the classifier to make an incorrect prediction. The added noise is scaled by a factor of $0.005$ for visualization purposes. Example was taken from the blogpost of haldar2020adversarial.
  • ...and 48 more figures

Theorems & Definitions (71)

  • Definition 2.1.1: Local Lipschitz constant
  • Definition 2.1.2: Product Upper Bound ($\mathrm{PUB}$) for Lipschitz constant
  • Definition 2.1.3: Weierstrass transform zayed1996handbook
  • Lemma 2.1.4: Stein's Lemma stein1970singular
  • Lemma 2.1.5: Lipschitz continuity of the Weierstrass transform for bounded functions salman2019provably
  • Lemma 2.1.6: Lipschitz continuity of the Weierstrass transform for Lipschitz functions nesterov2017random
  • Lemma 2.1.7: Lipschitz bound for quantile-composed smoothed classifier salman2019provablycohen2019certified
  • Definition 2.2.1: Adversarial attacks szegedy2013intriguing
  • Definition 2.2.2: Certified radius tsuzuku2018lipschitz
  • Definition 2.2.3: Certified accuracy for $\ell_2$-norm perturbations tsuzuku2018lipschitz
  • ...and 61 more