Table of Contents
Fetching ...

Certified Robustness via Dynamic Margin Maximization and Improved Lipschitz Regularization

Mahyar Fazlyab, Taha Entesari, Aniket Roy, Rama Chellappa

TL;DR

This work tackles adversarial robustness by targeting the input-space margin directly rather than solely maximizing the output margin. It introduces CRM, a training framework that couples a differentiable regularizer based on logit-difference Lipschitz constants with a scalable Lipschitz upper-bound estimator, LipLT, to shape the decision boundary efficiently. A key contribution is the derivation of Lipschitz-based surrogates for the certified radius, along with a loop-transformation technique that tightens Lipschitz bounds and scales to multi-layer networks. Empirical results on MNIST, CIFAR-10, and Tiny-ImageNet show competitive or superior certified robustness and improved Lipschitz estimation, with a practical, GPU-friendly implementation. Overall, the paper provides a principled, scalable path to end-to-end robust training with certified guarantees.

Abstract

To improve the robustness of deep classifiers against adversarial perturbations, many approaches have been proposed, such as designing new architectures with better robustness properties (e.g., Lipschitz-capped networks), or modifying the training process itself (e.g., min-max optimization, constrained learning, or regularization). These approaches, however, might not be effective at increasing the margin in the input (feature) space. As a result, there has been an increasing interest in developing training procedures that can directly manipulate the decision boundary in the input space. In this paper, we build upon recent developments in this category by developing a robust training algorithm whose objective is to increase the margin in the output (logit) space while regularizing the Lipschitz constant of the model along vulnerable directions. We show that these two objectives can directly promote larger margins in the input space. To this end, we develop a scalable method for calculating guaranteed differentiable upper bounds on the Lipschitz constant of neural networks accurately and efficiently. The relative accuracy of the bounds prevents excessive regularization and allows for more direct manipulation of the decision boundary. Furthermore, our Lipschitz bounding algorithm exploits the monotonicity and Lipschitz continuity of the activation layers, and the resulting bounds can be used to design new layers with controllable bounds on their Lipschitz constant. Experiments on the MNIST, CIFAR-10, and Tiny-ImageNet data sets verify that our proposed algorithm obtains competitively improved results compared to the state-of-the-art.

Certified Robustness via Dynamic Margin Maximization and Improved Lipschitz Regularization

TL;DR

This work tackles adversarial robustness by targeting the input-space margin directly rather than solely maximizing the output margin. It introduces CRM, a training framework that couples a differentiable regularizer based on logit-difference Lipschitz constants with a scalable Lipschitz upper-bound estimator, LipLT, to shape the decision boundary efficiently. A key contribution is the derivation of Lipschitz-based surrogates for the certified radius, along with a loop-transformation technique that tightens Lipschitz bounds and scales to multi-layer networks. Empirical results on MNIST, CIFAR-10, and Tiny-ImageNet show competitive or superior certified robustness and improved Lipschitz estimation, with a practical, GPU-friendly implementation. Overall, the paper provides a principled, scalable path to end-to-end robust training with certified guarantees.

Abstract

To improve the robustness of deep classifiers against adversarial perturbations, many approaches have been proposed, such as designing new architectures with better robustness properties (e.g., Lipschitz-capped networks), or modifying the training process itself (e.g., min-max optimization, constrained learning, or regularization). These approaches, however, might not be effective at increasing the margin in the input (feature) space. As a result, there has been an increasing interest in developing training procedures that can directly manipulate the decision boundary in the input space. In this paper, we build upon recent developments in this category by developing a robust training algorithm whose objective is to increase the margin in the output (logit) space while regularizing the Lipschitz constant of the model along vulnerable directions. We show that these two objectives can directly promote larger margins in the input space. To this end, we develop a scalable method for calculating guaranteed differentiable upper bounds on the Lipschitz constant of neural networks accurately and efficiently. The relative accuracy of the bounds prevents excessive regularization and allows for more direct manipulation of the decision boundary. Furthermore, our Lipschitz bounding algorithm exploits the monotonicity and Lipschitz continuity of the activation layers, and the resulting bounds can be used to design new layers with controllable bounds on their Lipschitz constant. Experiments on the MNIST, CIFAR-10, and Tiny-ImageNet data sets verify that our proposed algorithm obtains competitively improved results compared to the state-of-the-art.
Paper Structure (73 sections, 19 theorems, 87 equations, 8 figures, 8 tables, 3 algorithms)

This paper contains 73 sections, 19 theorems, 87 equations, 8 figures, 8 tables, 3 algorithms.

Key Result

Proposition 1

We have the following relationship between $\underline{R}_t^{soft}(x,y;\theta)$ defined in eq: input margin lower bound Lip soft and $\underline{R}(x,y;\theta)$ defined in eq: input margin lower bound Lip.

Figures (8)

  • Figure 1: Loop transformation on a residual layer of the form $h(x)=Hx+G\phi(Wx)$. Here we use $\varphi(x) = \tanh{x}$ for the illustration of the loop transformation.
  • Figure 2: (a) Distribution of certified radii calculated using direct pairwise Lipschitz constants for the MNIST test dataset for the network trained using CRM. (b-c) Comparison of the distribution of the certified radii for a model trained using CRM (top) versus a standard trained model (bottom) for MNIST (b) and CIFAR-10 (c). For any given $\epsilon$, the probability curves denote the empirical probability of a data point from that data set having a certified radius of at least $\epsilon$.
  • Figure 3: Illustration of using mini-batching for parallelizing the power iteration process of our proposed algorithm on the proposed toy example. The black lines and equations represent the calculations for the forward propagation and the red are for the backward propagation. In this figure, the vertical stacking operation $xy$ represents concatenation in the mini-batch dimension.
  • Figure 4: Comparison of the effect of the parameter $r_0$ on the certified radius of test data points for (a) MNIST and (b) CIFAR-10.
  • Figure 5: GPU implementation of LipLT depicts the claimed linear time complexity. (a) Time spent on the calculation of pairwise Lipschitz constants. (b) Time for a full iteration of training.
  • ...and 3 more figures

Theorems & Definitions (30)

  • Proposition 1
  • Theorem 1: fazlyab2019efficient
  • Proposition 2
  • Proposition 3
  • Theorem 2
  • Lemma 1
  • Theorem 3
  • Proposition 3
  • Proof
  • Proposition 3
  • ...and 20 more