Table of Contents
Fetching ...

Achieving Domain-Independent Certified Robustness via Knowledge Continuity

Alan Sun, Chiyu Ma, Kenneth Ge, Soroush Vosoughi

TL;DR

A novel definition inspired by Lipschitz continuity is presented which aims to certify the robustness of neural networks across input domains (such as continuous and discrete domains in vision and language, respectively), and it is demonstrated that the expressiveness of a model class is not at odds with its knowledge continuity.

Abstract

We present knowledge continuity, a novel definition inspired by Lipschitz continuity which aims to certify the robustness of neural networks across input domains (such as continuous and discrete domains in vision and language, respectively). Most existing approaches that seek to certify robustness, especially Lipschitz continuity, lie within the continuous domain with norm and distribution-dependent guarantees. In contrast, our proposed definition yields certification guarantees that depend only on the loss function and the intermediate learned metric spaces of the neural network. These bounds are independent of domain modality, norms, and distribution. We further demonstrate that the expressiveness of a model class is not at odds with its knowledge continuity. This implies that achieving robustness by maximizing knowledge continuity should not theoretically hinder inferential performance. Finally, to complement our theoretical results, we present several applications of knowledge continuity such as regularization, a certification algorithm, and show that knowledge continuity can be used to localize vulnerable components of a neural network.

Achieving Domain-Independent Certified Robustness via Knowledge Continuity

TL;DR

A novel definition inspired by Lipschitz continuity is presented which aims to certify the robustness of neural networks across input domains (such as continuous and discrete domains in vision and language, respectively), and it is demonstrated that the expressiveness of a model class is not at odds with its knowledge continuity.

Abstract

We present knowledge continuity, a novel definition inspired by Lipschitz continuity which aims to certify the robustness of neural networks across input domains (such as continuous and discrete domains in vision and language, respectively). Most existing approaches that seek to certify robustness, especially Lipschitz continuity, lie within the continuous domain with norm and distribution-dependent guarantees. In contrast, our proposed definition yields certification guarantees that depend only on the loss function and the intermediate learned metric spaces of the neural network. These bounds are independent of domain modality, norms, and distribution. We further demonstrate that the expressiveness of a model class is not at odds with its knowledge continuity. This implies that achieving robustness by maximizing knowledge continuity should not theoretically hinder inferential performance. Finally, to complement our theoretical results, we present several applications of knowledge continuity such as regularization, a certification algorithm, and show that knowledge continuity can be used to localize vulnerable components of a neural network.

Paper Structure

This paper contains 34 sections, 22 theorems, 42 equations, 6 figures, 4 tables, 2 algorithms.

Key Result

Theorem 4.1

Let $A \subset \mathcal{X} \times \mathcal{Y}$ such that $\mathbb{P}_{\mathcal{D}_{\mathcal{X},\mathcal{Y}}}[A] > 0$ and $\delta, \eta > 0$. Let $A' = \left\{(x',y') \in \mathcal{X} \times \mathcal{Y}: \mathbb{E}_{\substack{(x,y)\sim \mathcal{D}_{\mathcal{X},\mathcal{Y}} \\ (x,y) \in A}}\Delta\mathc

Figures (6)

  • Figure 1: Examples of knowledge (dis)continuities. $f: \mathcal{X} \to \mathcal{Y}$ is a measurable map, and $(\mathcal{Z}_k,d_k)$ is one of its hidden representations. The color of the points indicates loss. $\vardiamondsuit$ denotes knowledge continuity induced by sparsity: an isolated concept with no knowledge relations close to it. So, any perturbation moves $\vardiamondsuit$ far away with high probability. Smooth changes in loss around $\bigstar$ implies knowledge continuity. Finally, $\rightpentagonblack$ is not knowledge continuous due to drastic changes in loss nearby. Notice that the classification of points is independent of input/output clustering behavior since $\mathcal{X},\mathcal{Y}$ may not be endowed with a metric.
  • Figure 2: (a) The average percentage of successful adversarial attacks by TextFooler jin2020textfooler on a host of models roberts2019t5radford2018gptdevlin2018bertliu2020roberta and the IMDB imdb dataset regressed with the average of knowledge continuity coefficients across all hidden layers ($R^2=0.35$). (b) $k$-Volatility as $k$ is varied across a model's relative depth. (c) Correlation between $k$-volatility and adversarial vulnerability (averaged across all models shown in (b)) with respect to TextFooler jin2020textfooler as $k$ varies.
  • Figure 3: Average $k$-volatility plotted against the log of number of model parameters (left). We see that although there is a strong negative correlation, the exactly relationship is nontrivial. Moreover, this negative correlation is also consistently observed across model families (right).
  • Figure 4: Regularization $k$-volatility for a host of vision models. We apply two adversarial attacks FGSM goodfellow2014explaining (top row) and SI-NI-FGSM Lin2020Nesterov (bottom row) with various attack strengths. Attack strength is measured in terms of maximum $\ell_2$-norm of the applied perturbation to the image.
  • Figure 5: Ablation over the strength of regularization and its effect on the attack strength-attack success rate curves (left). Ablation over the regularization strength (for fixed attack strength = 0.3) and its effect on test accuracy (right). We see that moderate regularization significantly improves robustness across all attack strengths. This improvement does not come at the expense of test accuracy. The attack-strength is measured using the minimum angular similarity between the perturbed and original text. Both ablations are done with respect to GPT2 on the IMDB imdb dataset with respect to the TextFooler attack jin2020textfooler.
  • ...and 1 more figures

Theorems & Definitions (43)

  • Definition 1: Metric Decomposition
  • Remark 1
  • Definition 2: $k$-Volatility
  • Definition 3: Pointwise $\varepsilon$-Knowledge Continuity
  • Definition 4: Expected $\varepsilon$-Knowledge Continuity
  • Theorem 4.1
  • Corollary 4.2
  • proof
  • Definition 5: Universal Function Approximator
  • Proposition 4.3
  • ...and 33 more