Table of Contents
Fetching ...

Smooth Adversarial Training

Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le

TL;DR

The paper challenges the belief that adversarial robustness requires sacrificing accuracy or additional compute by identifying ReLU's non-smooth gradient as a bottleneck in adversarial training. It introduces Smooth Adversarial Training (SAT), which replaces ReLU with smooth activation functions in both forward and backward passes to improve gradient quality, enabling stronger adversarial examples and better optimization. Empirical results on ImageNet show substantial robustness gains with no accuracy loss or extra cost for ResNet-50, and even larger gains with EfficientNet-L1, surpassing prior state-of-the-art. The work also explores scaling, sanity checks, and CIFAR-10 behavior, highlighting the importance of gradient quality and smooth activations for robust learning.

Abstract

It is commonly believed that networks cannot be both accurate and robust, that gaining robustness means losing accuracy. It is also generally believed that, unless making networks larger, network architectural elements would otherwise matter little in improving adversarial robustness. Here we present evidence to challenge these common beliefs by a careful study about adversarial training. Our key observation is that the widely-used ReLU activation function significantly weakens adversarial training due to its non-smooth nature. Hence we propose smooth adversarial training (SAT), in which we replace ReLU with its smooth approximations to strengthen adversarial training. The purpose of smooth activation functions in SAT is to allow it to find harder adversarial examples and compute better gradient updates during adversarial training. Compared to standard adversarial training, SAT improves adversarial robustness for "free", i.e., no drop in accuracy and no increase in computational cost. For example, without introducing additional computations, SAT significantly enhances ResNet-50's robustness from 33.0% to 42.3%, while also improving accuracy by 0.9% on ImageNet. SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82.2% accuracy and 58.6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9.5% for accuracy and 11.6% for robustness. Models are available at https://github.com/cihangxie/SmoothAdversarialTraining.

Smooth Adversarial Training

TL;DR

The paper challenges the belief that adversarial robustness requires sacrificing accuracy or additional compute by identifying ReLU's non-smooth gradient as a bottleneck in adversarial training. It introduces Smooth Adversarial Training (SAT), which replaces ReLU with smooth activation functions in both forward and backward passes to improve gradient quality, enabling stronger adversarial examples and better optimization. Empirical results on ImageNet show substantial robustness gains with no accuracy loss or extra cost for ResNet-50, and even larger gains with EfficientNet-L1, surpassing prior state-of-the-art. The work also explores scaling, sanity checks, and CIFAR-10 behavior, highlighting the importance of gradient quality and smooth activations for robust learning.

Abstract

It is commonly believed that networks cannot be both accurate and robust, that gaining robustness means losing accuracy. It is also generally believed that, unless making networks larger, network architectural elements would otherwise matter little in improving adversarial robustness. Here we present evidence to challenge these common beliefs by a careful study about adversarial training. Our key observation is that the widely-used ReLU activation function significantly weakens adversarial training due to its non-smooth nature. Hence we propose smooth adversarial training (SAT), in which we replace ReLU with its smooth approximations to strengthen adversarial training. The purpose of smooth activation functions in SAT is to allow it to find harder adversarial examples and compute better gradient updates during adversarial training. Compared to standard adversarial training, SAT improves adversarial robustness for "free", i.e., no drop in accuracy and no increase in computational cost. For example, without introducing additional computations, SAT significantly enhances ResNet-50's robustness from 33.0% to 42.3%, while also improving accuracy by 0.9% on ImageNet. SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82.2% accuracy and 58.6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9.5% for accuracy and 11.6% for robustness. Models are available at https://github.com/cihangxie/SmoothAdversarialTraining.

Paper Structure

This paper contains 37 sections, 8 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The visualization of ReLU and Parametric Softplus. Left panel: the forward pass for ReLU (blue curve) and Parametric Softplus (red curve). Right panel: the first derivatives for ReLU (blue curve) and Parametric Softplus (red curve). Different from ReLU, Parametric Softplus is smooth with continuous derivatives.
  • Figure 2: Visualizations of 5 different smooth activation functions and their derivatives.
  • Figure 3: Smooth activation functions improve adversarial training. Compared to ReLU, all smooth activation functions significantly boost robustness, while keeping accuracy almost the same.
  • Figure 4: Scaling-up EfficientNet in SAT. Note EfficientNet-L1 is not connected to the rest of the graph because it was not part of the compound scaling suggested by Tan2019.
  • Figure 5: Comparison of loss landscapes between ReLU baseline and SAT (using SILU) on a randomly selected ImageNet sample.