Table of Contents
Fetching ...

Parameter Interpolation Adversarial Training for Robust Image Classification

Xin Liu, Yichen Yang, Kun He, John E. Hopcroft

TL;DR

The paper tackles robustness oscillations and overfitting in adversarial training by introducing Parameter Interpolation Adversarial Training (PIAT), which linearly interpolates model parameters between successive epochs with a dynamically increasing weight. It couples this with NMSE, a logit-normalization-based regularizer that aligns relative logit magnitudes between clean and adversarial examples, and combines them in a single loss L = L_CE + μ L_NMSE. Across CNNs and Vision Transformers on CIFAR-10/100 and SVHN, PIAT+NMSE consistently improves robust accuracy under strong attacks (PGD20, CW, AutoAttack) while preserving clean accuracy, outperforming several baselines. The approach is validated via toy experiments, theoretical insights (Theorem 3.1 on interpolation stability), extensive ablations, and loss-landscape analyses, highlighting its stability and generality. The method offers a practical, adaptable route to more reliable adversarial defenses without altering the core AT framework.

Abstract

Though deep neural networks exhibit superior performance on various tasks, they are still plagued by adversarial examples. Adversarial training has been demonstrated to be the most effective method to defend against adversarial attacks. However, existing adversarial training methods show that the model robustness has apparent oscillations and overfitting issues in the training process, degrading the defense efficacy. To address these issues, we propose a novel framework called Parameter Interpolation Adversarial Training (PIAT). PIAT tunes the model parameters between each epoch by interpolating the parameters of the previous and current epochs. It makes the decision boundary of model change more moderate and alleviates the overfitting issue, helping the model converge better and achieving higher model robustness. In addition, we suggest using the Normalized Mean Square Error (NMSE) to further improve the robustness by aligning the relative magnitude of logits between clean and adversarial examples rather than the absolute magnitude. Extensive experiments conducted on several benchmark datasets demonstrate that our framework could prominently improve the robustness of both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs).

Parameter Interpolation Adversarial Training for Robust Image Classification

TL;DR

The paper tackles robustness oscillations and overfitting in adversarial training by introducing Parameter Interpolation Adversarial Training (PIAT), which linearly interpolates model parameters between successive epochs with a dynamically increasing weight. It couples this with NMSE, a logit-normalization-based regularizer that aligns relative logit magnitudes between clean and adversarial examples, and combines them in a single loss L = L_CE + μ L_NMSE. Across CNNs and Vision Transformers on CIFAR-10/100 and SVHN, PIAT+NMSE consistently improves robust accuracy under strong attacks (PGD20, CW, AutoAttack) while preserving clean accuracy, outperforming several baselines. The approach is validated via toy experiments, theoretical insights (Theorem 3.1 on interpolation stability), extensive ablations, and loss-landscape analyses, highlighting its stability and generality. The method offers a practical, adaptable route to more reliable adversarial defenses without altering the core AT framework.

Abstract

Though deep neural networks exhibit superior performance on various tasks, they are still plagued by adversarial examples. Adversarial training has been demonstrated to be the most effective method to defend against adversarial attacks. However, existing adversarial training methods show that the model robustness has apparent oscillations and overfitting issues in the training process, degrading the defense efficacy. To address these issues, we propose a novel framework called Parameter Interpolation Adversarial Training (PIAT). PIAT tunes the model parameters between each epoch by interpolating the parameters of the previous and current epochs. It makes the decision boundary of model change more moderate and alleviates the overfitting issue, helping the model converge better and achieving higher model robustness. In addition, we suggest using the Normalized Mean Square Error (NMSE) to further improve the robustness by aligning the relative magnitude of logits between clean and adversarial examples rather than the absolute magnitude. Extensive experiments conducted on several benchmark datasets demonstrate that our framework could prominently improve the robustness of both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs).

Paper Structure

This paper contains 17 sections, 13 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: The robust accuracy of ResNet18 trained on CIFAR10 dataset by existing advanced adversarial training methods has apparent oscillations and overfitting issues in the training process. On the contrary, our PIAT framework achieves excellent robust accuracy with better convergence, further improving the model performance.
  • Figure 2: The data distributions of the toy example, which is two concentric circles with different radii. The class 1 data are primarily located within the inner, while the class 2 data are mainly distributed on the outside.
  • Figure 3: Illustrations of defense performance under PGD adversarial attack.The first figure illustrates the accuracy and robustness of the toy model trained using PGD-AT and PIAT on the 3D dataset, while the second figure demonstrates the accuracy and robustness of the toy model with ALP and NMSE regularization on the same dataset.
  • Figure 4: Illustrations of the 2D decision boundary of the model trained using PGD-AT at the 27$^{th}$ epoch. The corresponding data points are marked by circles. While the blue data points near the top left of the decision boundary are correctly classified, the red data points situated around the top left and right are misclassified.
  • Figure 5: Illustrations on the decision boundaries of PGD-AT and PIAT before/after an epoch. Each subfigure contains the decision boundary illustration before (in light-colored) and after (in dark-colored) the adversarial training. Specifically, the light-colored data represent the 26$^\text{th}$ epoch of PGD-AT and the 24$^\text{th}$ epoch of PIAT, while the dark-colored data correspond to the 27$^\text{th}$ epoch of PGD-AT and the 25$^\text{th}$ epoch of PIAT.
  • ...and 4 more figures