Adversarial Training via Adaptive Knowledge Amalgamation of an Ensemble of Teachers

Shayan Mohajer Hamidi; Linfeng Ye

Adversarial Training via Adaptive Knowledge Amalgamation of an Ensemble of Teachers

Shayan Mohajer Hamidi, Linfeng Ye

TL;DR

This work tackles the dual challenges of achieving adversarial robustness in small neural networks and generalizing robustness to unseen attack types. It introduces AT-AKA, which uses an ensemble of teachers trained on SVGD-generated diverse adversarial samples and adaptively amalgamates their logits to train a generalized robust student. The authors extend the framework with CAT-AKA, a collaborative online KD variant that shares knowledge across multiple students. Empirical results on CIFAR-10/100 against FGSM, PGD, and AutoAttack show that AT-AKA and CAT-AKA deliver superior adversarial robustness with minimal losses in clean accuracy, outperforming several state-of-the-art adversarial training and robustness-distillation methods.

Abstract

Adversarial training (AT) is a popular method for training robust deep neural networks (DNNs) against adversarial attacks. Yet, AT suffers from two shortcomings: (i) the robustness of DNNs trained by AT is highly intertwined with the size of the DNNs, posing challenges in achieving robustness in smaller models; and (ii) the adversarial samples employed during the AT process exhibit poor generalization, leaving DNNs vulnerable to unforeseen attack types. To address these dual challenges, this paper introduces adversarial training via adaptive knowledge amalgamation of an ensemble of teachers (AT-AKA). In particular, we generate a diverse set of adversarial samples as the inputs to an ensemble of teachers; and then, we adaptively amalgamate the logtis of these teachers to train a generalized-robust student. Through comprehensive experiments, we illustrate the superior efficacy of AT-AKA over existing AT methods and adversarial robustness distillation techniques against cutting-edge attacks, including AutoAttack.

Adversarial Training via Adaptive Knowledge Amalgamation of an Ensemble of Teachers

TL;DR

Abstract

Paper Structure (27 sections, 9 equations, 3 figures, 5 tables)

This paper contains 27 sections, 9 equations, 3 figures, 5 tables.

Introduction
Related Works
Adversarial defenses
Adversarial training via knowledge distillation
Ensemble of sub-models for robustness
Preliminaries and Notations
Notations
Preliminaries
Motivation
Methodology and Formulation
Generating adversarial samples
SVGD Vs. PGD
AT-AKA mechanism
Adaptive amalgamation functions
Naive AT-AKA.
...and 12 more sections

Figures (3)

Figure 1: A two-dimensional caricature of the classification boundaries learnt by different methods: (a) an ordinary learning method using the clean samples; (b) adversarial training where PGD is used to generate adversarial samples; (c) $T_1$ via first set of adversarial samples; (d) $T_2$ via second set of adversarial samples; and (e) amalgamation of $T_1$ and $T_2$.
Figure 2: The AT-AKA framework. The teachers and the student are denoted by $\{T_i\}_{i \in [n]}$ and $S$, respectively. All the $\{T_i\}_{i \in [n]}$ and $S$ are trained from scratch. The clean sample $\boldsymbol{x}$ is fed to the student, however, $n$ distinct adversarial samples generated by SVGD are fed to the teachers. Then, the logits of the teachers are amalgamated, and the resulting logit is used to distill knowledge into the student.
Figure 3: The CAT-AKA framework. All the models are students, denoted by $\{S_i\}_{i \in [n]}$. All the students are trained from scratch. The clean sample $\boldsymbol{x}$ is fed to the SVGD block to generate $n$ distinct adversarial samples. Then, the logits of the students are amalgamated, and the resulting logit is used as a supervision for the students.

Theorems & Definitions (1)

remark thmcounterremark

Adversarial Training via Adaptive Knowledge Amalgamation of an Ensemble of Teachers

TL;DR

Abstract

Adversarial Training via Adaptive Knowledge Amalgamation of an Ensemble of Teachers

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (1)