Table of Contents
Fetching ...

Adversarial Fine-tuning of Compressed Neural Networks for Joint Improvement of Robustness and Efficiency

Hallgrimur Thorsteinsson, Valdemar J Henriksen, Tong Chen, Raghavendra Selvan

TL;DR

This work explores the effects of two different model compression methods -- structured weight pruning and quantization -- on adversarial robustness, and presents the trade-off between standard fine-tuning and adversarial fine-tuning.

Abstract

As deep learning (DL) models are increasingly being integrated into our everyday lives, ensuring their safety by making them robust against adversarial attacks has become increasingly critical. DL models have been found to be susceptible to adversarial attacks which can be achieved by introducing small, targeted perturbations to disrupt the input data. Adversarial training has been presented as a mitigation strategy which can result in more robust models. This adversarial robustness comes with additional computational costs required to design adversarial attacks during training. The two objectives -- adversarial robustness and computational efficiency -- then appear to be in conflict of each other. In this work, we explore the effects of two different model compression methods -- structured weight pruning and quantization -- on adversarial robustness. We specifically explore the effects of fine-tuning on compressed models, and present the trade-off between standard fine-tuning and adversarial fine-tuning. Our results show that compression does not inherently lead to loss in model robustness and adversarial fine-tuning of a compressed model can yield large improvement to the robustness performance of models. We present experiments on two benchmark datasets showing that adversarial fine-tuning of compressed models can achieve robustness performance comparable to adversarially trained models, while also improving computational efficiency.

Adversarial Fine-tuning of Compressed Neural Networks for Joint Improvement of Robustness and Efficiency

TL;DR

This work explores the effects of two different model compression methods -- structured weight pruning and quantization -- on adversarial robustness, and presents the trade-off between standard fine-tuning and adversarial fine-tuning.

Abstract

As deep learning (DL) models are increasingly being integrated into our everyday lives, ensuring their safety by making them robust against adversarial attacks has become increasingly critical. DL models have been found to be susceptible to adversarial attacks which can be achieved by introducing small, targeted perturbations to disrupt the input data. Adversarial training has been presented as a mitigation strategy which can result in more robust models. This adversarial robustness comes with additional computational costs required to design adversarial attacks during training. The two objectives -- adversarial robustness and computational efficiency -- then appear to be in conflict of each other. In this work, we explore the effects of two different model compression methods -- structured weight pruning and quantization -- on adversarial robustness. We specifically explore the effects of fine-tuning on compressed models, and present the trade-off between standard fine-tuning and adversarial fine-tuning. Our results show that compression does not inherently lead to loss in model robustness and adversarial fine-tuning of a compressed model can yield large improvement to the robustness performance of models. We present experiments on two benchmark datasets showing that adversarial fine-tuning of compressed models can achieve robustness performance comparable to adversarially trained models, while also improving computational efficiency.
Paper Structure (17 sections, 5 equations, 4 figures, 6 tables)

This paper contains 17 sections, 5 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Performance of compressed models on Fashion-MNIST and CIAFR10 with adversarial fine-tuning $\mathcal{T}_{ad} (\cdot)$. We perform $\ell_1$-norm pruning (\ref{['fig:suba_Clevels']}, \ref{['fig:subb_Clevels']}) and post-train quantization (\ref{['fig:subc_Clevels']}, \ref{['fig:subd_Clevels']}) on standard and robust models. In each subfigure, the horizontal axis shows the level of compression performed on the model, and the vertical axis shows the performance. Each model was trained three times and averaged out, error bars show the standard deviation between runs. Note that the scaling of performance are different for pruning and quantization.
  • Figure 2: Features created by a 8-layer CNN on the subset of Fashion-MNIST dataset with class "bag". The first column shows t-SNE visualization generated from standard and adversarial images from white box attacks on the standard and robust models.. The last three columns show the features generated by the last three hidden layers (layer 6, 7, 8) of three different model pairs: standard and robust baseline models ($f_{st}$ versus $f_{rb}$), quantized (with INT8 post-train quantization) standard models with and without adversarial fine-tuning ($f^q_{st}$ versus $\mathcal{T}_{ad} (f^q_{st})$), and pruned (with 80% sparsity) standard model with standard and adversarial fine-tuning ($\mathcal{T}_{st} (f^p_{st})$ versus $\mathcal{T}_{ad} (f^p_{st})$).
  • Figure 3: Performance of 8-layer compressed CNN models on Fashion-MNIST without fine-tuning. We perform $\ell_1$-norm pruning ($f^p$, left) and post-train quantization ($f^q$, right) on standard and robust models. In each subfigure, the horizontal axis shows the level of compression performed on the model, and the vertical axis shows the performance. Each model was trained three times and averages out, error bars show the standard deviation between runs.
  • Figure 4: Performance of 8-layer compressed 8-layer CNN on Fasion-MNIST ($f^q_{rb}$, left) and ResNet-18 on CIFAR10 ($f^q_{rb}$, right) without fine-tuning. We perform quantization-aware training with different precision on robust models. In each subfigure, the horizontal axis shows the level of compression performed on the model, and the vertical axis shows the performance. Each model was trained three times and averages out, error bars show the standard deviation between runs.