LTD: Low Temperature Distillation for Gradient Masking-free Adversarial Training

Erh-Chung Chen; Che-Rung Lee

LTD: Low Temperature Distillation for Gradient Masking-free Adversarial Training

Erh-Chung Chen, Che-Rung Lee

TL;DR

This work addresses the vulnerability of deep networks arising from one-hot label assumptions in real-world, ambiguous data. It introduces Low-Temperature Distillation (LTD), a knowledge-distillation-based adversarial training framework that uses soft labels generated by a low-temperature teacher and a fixed temperature student to preserve informative gradients and avoid gradient masking. Empirical results on CIFAR-10, CIFAR-100, and ImageNet show substantial robustness gains, particularly when combined with AWP, without requiring extra data. LTD thus provides a practical approach to relax closed-world assumptions and improve gradient quality for reliable adversarial defense in large-scale, real-world datasets.

Abstract

Adversarial training is a widely adopted strategy to bolster the robustness of neural network models against adversarial attacks. This paper revisits the fundamental assumptions underlying image classification and suggests that representing data as one-hot labels is a key factor that leads to vulnerabilities. However, in real-world datasets, data ambiguity often arises, with samples exhibiting characteristics of multiple classes, rendering one-hot label representations imprecise. To address this, we introduce a novel approach, Low-Temperature Distillation (LTD), designed to refine label representations. Unlike previous approaches, LTD incorporates a relatively low temperature in the teacher model, while maintaining a fixed temperature for the student model during both training and inference. This strategy not only refines assumptions about data distribution but also strengthens model robustness and avoids the gradient masking problem commonly encountered in defensive distillation. Experimental results demonstrate the efficacy of the proposed method when combined with existing frameworks, achieving robust accuracy rates of 58.19%, 31.13%, and 42.08% on the CIFAR-10, CIFAR-100, and ImageNet datasets, respectively, without the need for additional data.

LTD: Low Temperature Distillation for Gradient Masking-free Adversarial Training

TL;DR

Abstract

LTD: Low Temperature Distillation for Gradient Masking-free Adversarial Training

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)

Theorems & Definitions (1)