Table of Contents
Fetching ...

Improving Adversarial Robustness via Feature Pattern Consistency Constraint

Jiacong Hu, Jingwen Ye, Zunlei Feng, Jiazhen Yang, Shunyu Liu, Xiaotian Yu, Lingxiang Jia, Mingli Song

TL;DR

This paper tackles adversarial vulnerability by shifting focus from perturbations to the behavior of latent features under clean training. The authors introduce the Feature Pattern Consistency Constraint (FPCC), a plug-and-play framework comprising Spatial-wise Feature Modification, Channel-wise Feature Selection, and Pattern-based Robustness Optimization to enforce correct feature patterns across layers. FPCC demonstrates superior inherent robustness and inference speed compared with state-of-the-art defenses on CIFAR-10/100 with architectures like WRN-28-10, ResNet-50, and VGG-16, while passing gradient sanity checks. The work highlights the potential of leveraging clean-data dynamics to improve robustness, though it acknowledges the need for theoretical grounding and comparisons with adversarial purification methods in future work.

Abstract

Convolutional Neural Networks (CNNs) are well-known for their vulnerability to adversarial attacks, posing significant security concerns. In response to these threats, various defense methods have emerged to bolster the model's robustness. However, most existing methods either focus on learning from adversarial perturbations, leading to overfitting to the adversarial examples, or aim to eliminate such perturbations during inference, inevitably increasing computational burdens. Conversely, clean training, which strengthens the model's robustness by relying solely on clean examples, can address the aforementioned issues. In this paper, we align with this methodological stream and enhance its generalizability to unknown adversarial examples. This enhancement is achieved by scrutinizing the behavior of latent features within the network. Recognizing that a correct prediction relies on the correctness of the latent feature's pattern, we introduce a novel and effective Feature Pattern Consistency Constraint (FPCC) method to reinforce the latent feature's capacity to maintain the correct feature pattern. Specifically, we propose Spatial-wise Feature Modification and Channel-wise Feature Selection to enhance latent features. Subsequently, we employ the Pattern Consistency Loss to constrain the similarity between the feature pattern of the latent features and the correct feature pattern. Our experiments demonstrate that the FPCC method empowers latent features to uphold correct feature patterns even in the face of adversarial examples, resulting in inherent adversarial robustness surpassing state-of-the-art models.

Improving Adversarial Robustness via Feature Pattern Consistency Constraint

TL;DR

This paper tackles adversarial vulnerability by shifting focus from perturbations to the behavior of latent features under clean training. The authors introduce the Feature Pattern Consistency Constraint (FPCC), a plug-and-play framework comprising Spatial-wise Feature Modification, Channel-wise Feature Selection, and Pattern-based Robustness Optimization to enforce correct feature patterns across layers. FPCC demonstrates superior inherent robustness and inference speed compared with state-of-the-art defenses on CIFAR-10/100 with architectures like WRN-28-10, ResNet-50, and VGG-16, while passing gradient sanity checks. The work highlights the potential of leveraging clean-data dynamics to improve robustness, though it acknowledges the need for theoretical grounding and comparisons with adversarial purification methods in future work.

Abstract

Convolutional Neural Networks (CNNs) are well-known for their vulnerability to adversarial attacks, posing significant security concerns. In response to these threats, various defense methods have emerged to bolster the model's robustness. However, most existing methods either focus on learning from adversarial perturbations, leading to overfitting to the adversarial examples, or aim to eliminate such perturbations during inference, inevitably increasing computational burdens. Conversely, clean training, which strengthens the model's robustness by relying solely on clean examples, can address the aforementioned issues. In this paper, we align with this methodological stream and enhance its generalizability to unknown adversarial examples. This enhancement is achieved by scrutinizing the behavior of latent features within the network. Recognizing that a correct prediction relies on the correctness of the latent feature's pattern, we introduce a novel and effective Feature Pattern Consistency Constraint (FPCC) method to reinforce the latent feature's capacity to maintain the correct feature pattern. Specifically, we propose Spatial-wise Feature Modification and Channel-wise Feature Selection to enhance latent features. Subsequently, we employ the Pattern Consistency Loss to constrain the similarity between the feature pattern of the latent features and the correct feature pattern. Our experiments demonstrate that the FPCC method empowers latent features to uphold correct feature patterns even in the face of adversarial examples, resulting in inherent adversarial robustness surpassing state-of-the-art models.
Paper Structure (18 sections, 11 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 18 sections, 11 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Feature Pattern Consistency Constraint training framework. Typically, 'CFS' and 'PRO' are configured in pairs within the network.
  • Figure 2: Feature patterns of a correctly predicted sample, an incorrectly predicted sample, and their corresponding ground-truth category. The feature pattern of the ground-truth category is derived by averaging the feature patterns of the top 10 correctly predicted samples, identified based on the highest predicted probabilities. The horizontal axis denotes the feature dimensions, while the vertical axis represents the relative magnitude of the features. Both correct and incorrect samples are randomly selected from the 'dog' category of the CIFAR-10 dataset. To streamline the illustration, only the first $10$ dimensions of the penultimate layer (fully connected layer) of the VGG-16 network are displayed.
  • Figure 3: Impact of the positions and quantities of CFS and PRO on accuracy. The blue lines and red lines represent the cumulative insertion of CFS and PRO into the network, either from the shallow to the deep layer or from the deep to the shallow layer, respectively. Experiments were conducted on the VGG-16 using the CIFAR-10, and robust accuracy was measured against the PGD ($\ell_2$) attack.
  • Figure 4: Impact of the proportion of features selected in CFS on accuracy. The variable $\gamma$ in Eqn. \ref{['eqn:cfs']} is inversely proportional to the proportion of features selected; a larger $\gamma$ value corresponds to fewer features being selected. Experiments were conducted on the VGG-16 using the CIFAR-10. Robust accuracy was measured against the PGD ($\ell_2$) attack.