Robust Feature Inference: A Test-time Defense Strategy using Spectral Projections

Anurag Singh; Mahalakshmi Sabanayagam; Krikamol Muandet; Debarghya Ghoshdastidar

Robust Feature Inference: A Test-time Defense Strategy using Spectral Projections

Anurag Singh, Mahalakshmi Sabanayagam, Krikamol Muandet, Debarghya Ghoshdastidar

TL;DR

This work proposes a novel test-time defense strategy called Robust Feature Inference (RFI) that is easy to integrate with any existing (robust) training procedure without additional test-time computation, and theoretically characterize the subspace of the eigenspectrum of the feature covariance that is the most robust for a generalized additive model.

Abstract

Test-time defenses are used to improve the robustness of deep neural networks to adversarial examples during inference. However, existing methods either require an additional trained classifier to detect and correct the adversarial samples, or perform additional complex optimization on the model parameters or the input to adapt to the adversarial samples at test-time, resulting in a significant increase in the inference time compared to the base model. In this work, we propose a novel test-time defense strategy called Robust Feature Inference (RFI) that is easy to integrate with any existing (robust) training procedure without additional test-time computation. Based on the notion of robustness of features that we present, the key idea is to project the trained models to the most robust feature space, thereby reducing the vulnerability to adversarial attacks in non-robust directions. We theoretically characterize the subspace of the eigenspectrum of the feature covariance that is the most robust for a generalized additive model. Our extensive experiments on CIFAR-10, CIFAR-100, tiny ImageNet and ImageNet datasets for several robustness benchmarks, including the state-of-the-art methods in RobustBench show that RFI improves robustness across adaptive and transfer attacks consistently. We also compare RFI with adaptive test-time defenses to demonstrate the effectiveness of our proposed approach.

Robust Feature Inference: A Test-time Defense Strategy using Spectral Projections

TL;DR

Abstract

Paper Structure (46 sections, 5 theorems, 22 equations, 6 figures, 19 tables, 1 algorithm)

This paper contains 46 sections, 5 theorems, 22 equations, 6 figures, 19 tables, 1 algorithm.

Introduction
Related Works
Robust Feature Inference: A Test-time Defense Strategy using Spectral Projections
Robust and Non-Robust Features
Our Algorithm: Robust Feature Inference (RFI)
Robustness vs information of features
Experimental Results
RFI improves adversarial robustness consistently
Transfer Attack Evaluation: RFI is stronger than base model
Static RFI is better than Dynamic/Adaptive RFI
Static RFI outperforms adaptive test-time defenses
Abalation Studies
Effect of adversary strength
Choice of $K$
Comparison of RFI to similar conceptual methods
...and 31 more sections

Key Result

Theorem 3.2

Given $h(\mathbf{x}) = \bm{\beta}^\top \phi(\mathbf{x})$. Assume that the distribution $\mathcal{D}$ is such that $y = h(\mathbf{x}) + \bm{\epsilon}$, where $\bm{\epsilon} \in \mathbb{R}^C$ has independent coordinates, each satisfying $\mathbb{E}[\epsilon_c] = 0$, $\mathbb{E}[\epsilon_c^2] \leq \sig where $\Sigma = \mathbb{E}_\mathbf{x} \left[\phi(\mathbf{x})\phi(\mathbf{x})^\top\right]$ and $\Ver

Figures (6)

Figure 1: Illustration of our test-time defense mechanism. Given any trained model $h(\mathbf{x})$, we first post-process the penultimate layer features $\phi(\mathbf{x})$ to get the top most informative and robust features in eigenspace $\Tilde{\mathbf{U}}$ using the training data. During inference of the test data $\mathbf{x}_t$, $\phi({\mathbf{x}_t})$ is projected onto the robust feature space using $\phi({\mathbf{x}_t})\Tilde{\mathbf{U}}\Tilde{\mathbf{U}}^T$, equivalently changing $\bm\beta$ to $\Tilde{\bm\beta}=\Tilde{\mathbf{U}}\Tilde{\mathbf{U}}^T\bm\beta$.
Figure 2: Effect of $K$ in RFI. Robust accuracy and eigenvalue profile in ascending order of all the methods in Table \ref{['tab:rfi_calib']}.
Figure 3: NTK feature robustness for $\lambda$ and the corresponding eigenvalue profile in ascending order.
Figure 4: Ablation of performance with $K$ for all SoTA models for CIFAR-10 and CIFAR-100.
Figure 5: Eigenspectrum showing sharp drop at $K=$ number of classes for all SoTA models on CIFAR-10 and CIFAR-100.
...and 1 more figures

Theorems & Definitions (13)

Definition 3.1: $\ell_2$-Robustness of features
Theorem 3.2: Lower bound on robustness
Remark 3.3: Lower bound is tight up to constants
Corollary 3.4
Definition 3.5: Informative features
Corollary 3.6
Proposition 5.1: Learning dynamics of GAM
Proposition 5.2: NTK feature robustness lies at the top
proof
proof
...and 3 more

Robust Feature Inference: A Test-time Defense Strategy using Spectral Projections

TL;DR

Abstract

Robust Feature Inference: A Test-time Defense Strategy using Spectral Projections

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (13)