Table of Contents
Fetching ...

PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor

Jaewon Jung, Hongsun Jang, Jaeyong Song, Jinho Lee

TL;DR

This paper introduces PeerAiD, an online adversarial distillation framework that trains a specialized peer tutor together with a student to defend against adversarial examples crafted for the student. By generating adversarial examples from the student and using a jointly trained peer to supervise the student via a carefully designed loss, PeerAiD avoids reliance on a fixed pretrained teacher and yields stronger AutoAttack robustness and better natural accuracy on multiple datasets and models. The approach produces flatter loss landscapes and more discriminative feature representations, while experiments show robustness gains across CIFAR-10/100 and TinyImageNet without gradient obfuscation. Overall, PeerAiD offers a practical, scalable method to boost robustness in small models for security-critical applications.

Abstract

Adversarial robustness of the neural network is a significant concern when it is applied to security-critical domains. In this situation, adversarial distillation is a promising option which aims to distill the robustness of the teacher network to improve the robustness of a small student network. Previous works pretrain the teacher network to make it robust against the adversarial examples aimed at itself. However, the adversarial examples are dependent on the parameters of the target network. The fixed teacher network inevitably degrades its robustness against the unseen transferred adversarial examples which target the parameters of the student network in the adversarial distillation process. We propose PeerAiD to make a peer network learn the adversarial examples of the student network instead of adversarial examples aimed at itself. PeerAiD is an adversarial distillation that trains the peer network and the student network simultaneously in order to specialize the peer network for defending the student network. We observe that such peer networks surpass the robustness of the pretrained robust teacher model against adversarial examples aimed at the student network. With this peer network and adversarial distillation, PeerAiD achieves significantly higher robustness of the student network with AutoAttack (AA) accuracy by up to 1.66%p and improves the natural accuracy of the student network by up to 4.72%p with ResNet-18 on TinyImageNet dataset. Code is available at https://github.com/jaewonalive/PeerAiD.

PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor

TL;DR

This paper introduces PeerAiD, an online adversarial distillation framework that trains a specialized peer tutor together with a student to defend against adversarial examples crafted for the student. By generating adversarial examples from the student and using a jointly trained peer to supervise the student via a carefully designed loss, PeerAiD avoids reliance on a fixed pretrained teacher and yields stronger AutoAttack robustness and better natural accuracy on multiple datasets and models. The approach produces flatter loss landscapes and more discriminative feature representations, while experiments show robustness gains across CIFAR-10/100 and TinyImageNet without gradient obfuscation. Overall, PeerAiD offers a practical, scalable method to boost robustness in small models for security-critical applications.

Abstract

Adversarial robustness of the neural network is a significant concern when it is applied to security-critical domains. In this situation, adversarial distillation is a promising option which aims to distill the robustness of the teacher network to improve the robustness of a small student network. Previous works pretrain the teacher network to make it robust against the adversarial examples aimed at itself. However, the adversarial examples are dependent on the parameters of the target network. The fixed teacher network inevitably degrades its robustness against the unseen transferred adversarial examples which target the parameters of the student network in the adversarial distillation process. We propose PeerAiD to make a peer network learn the adversarial examples of the student network instead of adversarial examples aimed at itself. PeerAiD is an adversarial distillation that trains the peer network and the student network simultaneously in order to specialize the peer network for defending the student network. We observe that such peer networks surpass the robustness of the pretrained robust teacher model against adversarial examples aimed at the student network. With this peer network and adversarial distillation, PeerAiD achieves significantly higher robustness of the student network with AutoAttack (AA) accuracy by up to 1.66%p and improves the natural accuracy of the student network by up to 4.72%p with ResNet-18 on TinyImageNet dataset. Code is available at https://github.com/jaewonalive/PeerAiD.
Paper Structure (28 sections, 5 equations, 6 figures, 13 tables)

This paper contains 28 sections, 5 equations, 6 figures, 13 tables.

Figures (6)

  • Figure 1: Test robust accuracy of a pretrained robust teacher and a peer network trained from scratch. $x^{*}_{S}$, $x^{*}_{T}$ and $x^{*}_{P}$ denotes the adversarial examples generated from the student, teacher, and peer model, respectively. Results on CIFAR-10 (left) and CIFAR-100 (right) show that the peer sustains increasing robustness for the student not provided by the pretrained teacher.
  • Figure 2: The adversarial distillation procedure overview of baselines iadrslad and PeerAiD. (a) describes inner maximization to generate adversarial examples from the student model. Baselines use hard labels or pretrained teachers for this. On the other hand, PeerAiD uses a peer model. (b) illustrates the outer minimization procedure to optimize model parameters. Baselines only train the student models with the prediction of the pre-trained teachers, but PeerAiD trains both peer and student models simultaneously.
  • Figure 3: Robust accuracy against student-generated adversarial examples $x^{*}_{s}$. Test (a) and train (b) robust accuracy are presented. ResNet-18 is used to measure the robust accuracy with CIFAR-100.
  • Figure 4: Comparison of weight loss landscape visualization visualloss between baselines pgdtrades and PeerAiD. The WRN34-10 wrn model trained with CIFAR-100 by each method is perturbed along a random direction within the range of [$-0.75$, $0.75$]. The vertical axis $z$ denotes the loss value.
  • Figure 5: t-SNE results of the penultimate layer representation with the pretrained robust teacher model and PeerAiD.
  • ...and 1 more figures