Table of Contents
Fetching ...

Keeping the Bad Guys Out: Protecting and Vaccinating Deep Learning with JPEG Compression

Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Fred Hohman, Li Chen, Michael E. Kounavis, Duen Horng Chau

TL;DR

The paper addresses the vulnerability of deep neural networks to adversarial examples by proposing JPEG compression as a pragmatic, model-agnostic pre-processing defense. It demonstrates that selectively removing high-frequency perturbations via JPEG, together with training on compressed images (vaccination) and an ensemble of models across compression levels, can substantially reduce adversarial success rates on CIFAR-10 and GTSRB. Key contributions include a systematic analysis of JPEG quality effects, Vaccination through JPEG-trained models, and a fortified ensemble approach that maintains robustness against multiple attack types without requiring model-specific knowledge. The work offers a practical defense that leverages standard image compression techniques for real-world adversarial protection, with plans to extend evaluations to additional attacks and datasets.

Abstract

Deep neural networks (DNNs) have achieved great success in solving a variety of machine learning (ML) problems, especially in the domain of image recognition. However, recent research showed that DNNs can be highly vulnerable to adversarially generated instances, which look seemingly normal to human observers, but completely confuse DNNs. These adversarial samples are crafted by adding small perturbations to normal, benign images. Such perturbations, while imperceptible to the human eye, are picked up by DNNs and cause them to misclassify the manipulated instances with high confidence. In this work, we explore and demonstrate how systematic JPEG compression can work as an effective pre-processing step in the classification pipeline to counter adversarial attacks and dramatically reduce their effects (e.g., Fast Gradient Sign Method, DeepFool). An important component of JPEG compression is its ability to remove high frequency signal components, inside square blocks of an image. Such an operation is equivalent to selective blurring of the image, helping remove additive perturbations. Further, we propose an ensemble-based technique that can be constructed quickly from a given well-performing DNN, and empirically show how such an ensemble that leverages JPEG compression can protect a model from multiple types of adversarial attacks, without requiring knowledge about the model.

Keeping the Bad Guys Out: Protecting and Vaccinating Deep Learning with JPEG Compression

TL;DR

The paper addresses the vulnerability of deep neural networks to adversarial examples by proposing JPEG compression as a pragmatic, model-agnostic pre-processing defense. It demonstrates that selectively removing high-frequency perturbations via JPEG, together with training on compressed images (vaccination) and an ensemble of models across compression levels, can substantially reduce adversarial success rates on CIFAR-10 and GTSRB. Key contributions include a systematic analysis of JPEG quality effects, Vaccination through JPEG-trained models, and a fortified ensemble approach that maintains robustness against multiple attack types without requiring model-specific knowledge. The work offers a practical defense that leverages standard image compression techniques for real-world adversarial protection, with plans to extend evaluations to additional attacks and datasets.

Abstract

Deep neural networks (DNNs) have achieved great success in solving a variety of machine learning (ML) problems, especially in the domain of image recognition. However, recent research showed that DNNs can be highly vulnerable to adversarially generated instances, which look seemingly normal to human observers, but completely confuse DNNs. These adversarial samples are crafted by adding small perturbations to normal, benign images. Such perturbations, while imperceptible to the human eye, are picked up by DNNs and cause them to misclassify the manipulated instances with high confidence. In this work, we explore and demonstrate how systematic JPEG compression can work as an effective pre-processing step in the classification pipeline to counter adversarial attacks and dramatically reduce their effects (e.g., Fast Gradient Sign Method, DeepFool). An important component of JPEG compression is its ability to remove high frequency signal components, inside square blocks of an image. Such an operation is equivalent to selective blurring of the image, helping remove additive perturbations. Further, we propose an ensemble-based technique that can be constructed quickly from a given well-performing DNN, and empirically show how such an ensemble that leverages JPEG compression can protect a model from multiple types of adversarial attacks, without requiring knowledge about the model.

Paper Structure

This paper contains 12 sections, 1 equation, 5 figures, 1 table.

Figures (5)

  • Figure 1: A comparison of the classification results of an exemplar image from the German Traffic Sign Recognition Benchmark (GTSRB) dataset. A benign image (left) is originally classified as a stop sign, but after the addition of an adversarial perturbation to the image (middle) the resulting image is classified as a max speed 100 sign. Using JPEG compression on the adversarial image (right), we recover the original classification of stop sign.
  • Figure 2: Applying JPEG compression (dashed lines with symbols) can counter FGSM and DeepFool attacks on the CIFAR-10 and GTSRB datasets, e.g., slightly compressing CIFAR-10 images dramatically lowers DeepFool's attack success rate, indicated by the steep orange line (left plot). $\Phi$ means no compression has been applied. Attacks can be further suppressed by "vaccinating" a DNN model by training it with compressed images, and using an ensemble of such models -- our approach, discussed in Section \ref{['sec:our-approach']}, rectifies a great majority of misclassification (indicated by the horizontal dashed lines).
  • Figure 3: Classification accuracies of each vaccinated model on the CIFAR-10 test set that has been compressed to a particular image quality. Each cluster of bars represents the model performances when tested with images having the corresponding image quality as indicated on the vertical axis. Within each cluster, each bar represents a vaccinated model. Vertical red lines denote the accuracy of the original, non-vaccinated model for that image quality.
  • Figure 4: Performance of the vaccinated models on adversarially constructed test sets. Each line with a symbol represents a vaccinated model and the black horizontal dotted line represents the accuracy of the original model under attack. These results demonstrate that re-training with JPEG compressed images can help recover from an adversarial attack.
  • Figure 5: Accuracies of all models under consideration when each model is individually attacked. The attack does get transferred to other models but is mitigated with increasing JPEG compression.