Table of Contents
Fetching ...

Robust Feature Inference: A Test-time Defense Strategy using Spectral Projections

Anurag Singh, Mahalakshmi Sabanayagam, Krikamol Muandet, Debarghya Ghoshdastidar

TL;DR

This work proposes a novel test-time defense strategy called Robust Feature Inference (RFI) that is easy to integrate with any existing (robust) training procedure without additional test-time computation, and theoretically characterize the subspace of the eigenspectrum of the feature covariance that is the most robust for a generalized additive model.

Abstract

Test-time defenses are used to improve the robustness of deep neural networks to adversarial examples during inference. However, existing methods either require an additional trained classifier to detect and correct the adversarial samples, or perform additional complex optimization on the model parameters or the input to adapt to the adversarial samples at test-time, resulting in a significant increase in the inference time compared to the base model. In this work, we propose a novel test-time defense strategy called Robust Feature Inference (RFI) that is easy to integrate with any existing (robust) training procedure without additional test-time computation. Based on the notion of robustness of features that we present, the key idea is to project the trained models to the most robust feature space, thereby reducing the vulnerability to adversarial attacks in non-robust directions. We theoretically characterize the subspace of the eigenspectrum of the feature covariance that is the most robust for a generalized additive model. Our extensive experiments on CIFAR-10, CIFAR-100, tiny ImageNet and ImageNet datasets for several robustness benchmarks, including the state-of-the-art methods in RobustBench show that RFI improves robustness across adaptive and transfer attacks consistently. We also compare RFI with adaptive test-time defenses to demonstrate the effectiveness of our proposed approach.

Robust Feature Inference: A Test-time Defense Strategy using Spectral Projections

TL;DR

This work proposes a novel test-time defense strategy called Robust Feature Inference (RFI) that is easy to integrate with any existing (robust) training procedure without additional test-time computation, and theoretically characterize the subspace of the eigenspectrum of the feature covariance that is the most robust for a generalized additive model.

Abstract

Test-time defenses are used to improve the robustness of deep neural networks to adversarial examples during inference. However, existing methods either require an additional trained classifier to detect and correct the adversarial samples, or perform additional complex optimization on the model parameters or the input to adapt to the adversarial samples at test-time, resulting in a significant increase in the inference time compared to the base model. In this work, we propose a novel test-time defense strategy called Robust Feature Inference (RFI) that is easy to integrate with any existing (robust) training procedure without additional test-time computation. Based on the notion of robustness of features that we present, the key idea is to project the trained models to the most robust feature space, thereby reducing the vulnerability to adversarial attacks in non-robust directions. We theoretically characterize the subspace of the eigenspectrum of the feature covariance that is the most robust for a generalized additive model. Our extensive experiments on CIFAR-10, CIFAR-100, tiny ImageNet and ImageNet datasets for several robustness benchmarks, including the state-of-the-art methods in RobustBench show that RFI improves robustness across adaptive and transfer attacks consistently. We also compare RFI with adaptive test-time defenses to demonstrate the effectiveness of our proposed approach.
Paper Structure (46 sections, 5 theorems, 22 equations, 6 figures, 19 tables, 1 algorithm)

This paper contains 46 sections, 5 theorems, 22 equations, 6 figures, 19 tables, 1 algorithm.

Key Result

Theorem 3.2

Given $h(\mathbf{x}) = \bm{\beta}^\top \phi(\mathbf{x})$. Assume that the distribution $\mathcal{D}$ is such that $y = h(\mathbf{x}) + \bm{\epsilon}$, where $\bm{\epsilon} \in \mathbb{R}^C$ has independent coordinates, each satisfying $\mathbb{E}[\epsilon_c] = 0$, $\mathbb{E}[\epsilon_c^2] \leq \sig where $\Sigma = \mathbb{E}_\mathbf{x} \left[\phi(\mathbf{x})\phi(\mathbf{x})^\top\right]$ and $\Ver

Figures (6)

  • Figure 1: Illustration of our test-time defense mechanism. Given any trained model $h(\mathbf{x})$, we first post-process the penultimate layer features $\phi(\mathbf{x})$ to get the top most informative and robust features in eigenspace $\Tilde{\mathbf{U}}$ using the training data. During inference of the test data $\mathbf{x}_t$, $\phi({\mathbf{x}_t})$ is projected onto the robust feature space using $\phi({\mathbf{x}_t})\Tilde{\mathbf{U}}\Tilde{\mathbf{U}}^T$, equivalently changing $\bm\beta$ to $\Tilde{\bm\beta}=\Tilde{\mathbf{U}}\Tilde{\mathbf{U}}^T\bm\beta$.
  • Figure 2: Effect of $K$ in RFI. Robust accuracy and eigenvalue profile in ascending order of all the methods in Table \ref{['tab:rfi_calib']}.
  • Figure 3: NTK feature robustness for $\lambda$ and the corresponding eigenvalue profile in ascending order.
  • Figure 4: Ablation of performance with $K$ for all SoTA models for CIFAR-10 and CIFAR-100.
  • Figure 5: Eigenspectrum showing sharp drop at $K=$ number of classes for all SoTA models on CIFAR-10 and CIFAR-100.
  • ...and 1 more figures

Theorems & Definitions (13)

  • Definition 3.1: $\ell_2$-Robustness of features
  • Theorem 3.2: Lower bound on robustness
  • Remark 3.3: Lower bound is tight up to constants
  • Corollary 3.4
  • Definition 3.5: Informative features
  • Corollary 3.6
  • Proposition 5.1: Learning dynamics of GAM
  • Proposition 5.2: NTK feature robustness lies at the top
  • proof
  • proof
  • ...and 3 more