Improving Adversarial Robustness via Feature Pattern Consistency Constraint
Jiacong Hu, Jingwen Ye, Zunlei Feng, Jiazhen Yang, Shunyu Liu, Xiaotian Yu, Lingxiang Jia, Mingli Song
TL;DR
This paper tackles adversarial vulnerability by shifting focus from perturbations to the behavior of latent features under clean training. The authors introduce the Feature Pattern Consistency Constraint (FPCC), a plug-and-play framework comprising Spatial-wise Feature Modification, Channel-wise Feature Selection, and Pattern-based Robustness Optimization to enforce correct feature patterns across layers. FPCC demonstrates superior inherent robustness and inference speed compared with state-of-the-art defenses on CIFAR-10/100 with architectures like WRN-28-10, ResNet-50, and VGG-16, while passing gradient sanity checks. The work highlights the potential of leveraging clean-data dynamics to improve robustness, though it acknowledges the need for theoretical grounding and comparisons with adversarial purification methods in future work.
Abstract
Convolutional Neural Networks (CNNs) are well-known for their vulnerability to adversarial attacks, posing significant security concerns. In response to these threats, various defense methods have emerged to bolster the model's robustness. However, most existing methods either focus on learning from adversarial perturbations, leading to overfitting to the adversarial examples, or aim to eliminate such perturbations during inference, inevitably increasing computational burdens. Conversely, clean training, which strengthens the model's robustness by relying solely on clean examples, can address the aforementioned issues. In this paper, we align with this methodological stream and enhance its generalizability to unknown adversarial examples. This enhancement is achieved by scrutinizing the behavior of latent features within the network. Recognizing that a correct prediction relies on the correctness of the latent feature's pattern, we introduce a novel and effective Feature Pattern Consistency Constraint (FPCC) method to reinforce the latent feature's capacity to maintain the correct feature pattern. Specifically, we propose Spatial-wise Feature Modification and Channel-wise Feature Selection to enhance latent features. Subsequently, we employ the Pattern Consistency Loss to constrain the similarity between the feature pattern of the latent features and the correct feature pattern. Our experiments demonstrate that the FPCC method empowers latent features to uphold correct feature patterns even in the face of adversarial examples, resulting in inherent adversarial robustness surpassing state-of-the-art models.
