Indirect Gradient Matching for Adversarial Robust Distillation

Hongsin Lee; Seungju Cho; Changick Kim

Indirect Gradient Matching for Adversarial Robust Distillation

Hongsin Lee, Seungju Cho, Changick Kim

TL;DR

This work tackles the robustness gap between large and small models under adversarial training by introducing Indirect Gradient Distillation Module (IGDM). IGDM leverages the local linearity of adversarial training to indirectly align the student’s input gradient with the teacher’s by matching output differences under perturbations, enabling seamless integration with existing adversarial distillation methods. Empirical results on CIFAR-100 and other benchmarks show IGDM consistently improves AutoAttack and related robustness metrics while enhancing gradient alignment (lower Gradient Distance and higher Gradient Cosine similarity) across multiple teacher–student pairs. The approach provides a modular, efficient pathway to transfer gradient information, reducing reliance on heavy inner maximization changes and improving practical robustness in resource-constrained settings.

Abstract

Adversarial training significantly improves adversarial robustness, but superior performance is primarily attained with large models. This substantial performance gap for smaller models has spurred active research into adversarial distillation (AD) to mitigate the difference. Existing AD methods leverage the teacher's logits as a guide. In contrast to these approaches, we aim to transfer another piece of knowledge from the teacher, the input gradient. In this paper, we propose a distillation module termed Indirect Gradient Distillation Module (IGDM) that indirectly matches the student's input gradient with that of the teacher. Experimental results show that IGDM seamlessly integrates with existing AD methods, significantly enhancing their performance. Particularly, utilizing IGDM on the CIFAR-100 dataset improves the AutoAttack accuracy from 28.06% to 30.32% with the ResNet-18 architecture and from 26.18% to 29.32% with the MobileNetV2 architecture when integrated into the SOTA method without additional data augmentation.

Indirect Gradient Matching for Adversarial Robust Distillation

TL;DR

Abstract

Paper Structure (43 sections, 15 equations, 6 figures, 18 tables, 1 algorithm)

This paper contains 43 sections, 15 equations, 6 figures, 18 tables, 1 algorithm.

Introduction
Related Work
Adversarial Training
Adversarial Robust Distillation
Gradient Distillation and Input Gradient
Method
Local Linearity of Adversarial Training
Gradient Matching via Output Differences
Indirect Gradient Distillation Module (IGDM)
Experiments
Settings
Teacher and Student Models
Evaluation Metrics
Implementation
Results
...and 28 more sections

Figures (6)

Figure 1: Conceptual diagram of IGDM
Figure 2: Performance of IGDM
Figure 4:
Figure 7: Correlation between GC and AA ($left$) and between GD and AA ($right$). All results were obtained using ResNet-18 and a BDM-AT teacher on CIFAR-100. The values for IGDM and AD methods match those in \ref{['tab:main_resnet_cifar100']}, while 'Others' represent results from additional experiments under the same configuration. $\rho$ denotes the correlation coefficient.
Figure 8: Performance comparison of different adversarial distillation methods across various teacher models.
...and 1 more figures

Indirect Gradient Matching for Adversarial Robust Distillation

TL;DR

Abstract

Indirect Gradient Matching for Adversarial Robust Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)