Table of Contents
Fetching ...

A study of the effect of JPG compression on adversarial images

Gintare Karolina Dziugaite, Zoubin Ghahramani, Daniel M. Roy

TL;DR

Adversarial examples threaten neural network image classifiers by introducing imperceptible perturbations that fool predictions. The study tests whether a ubiquitous image preprocess, JPEG compression, can revert FGSM-generated perturbations by effectively projecting images back into a JPG subspace, using a pre-trained OverFeat model on ImageNet. Results show JPEG recompression substantially restores correct predictions for small perturbations (ε=1) but not for larger perturbations (ε=5,10), indicating JPEG is not a robust defense. The work highlights the limits of simple preprocessing for adversarial robustness and motivates deeper investigation into subspace projections and resilient defenses in high-dimensional vision tasks.

Abstract

Neural network image classifiers are known to be vulnerable to adversarial images, i.e., natural images which have been modified by an adversarial perturbation specifically designed to be imperceptible to humans yet fool the classifier. Not only can adversarial images be generated easily, but these images will often be adversarial for networks trained on disjoint subsets of data or with different architectures. Adversarial images represent a potential security risk as well as a serious machine learning challenge---it is clear that vulnerable neural networks perceive images very differently from humans. Noting that virtually every image classification data set is composed of JPG images, we evaluate the effect of JPG compression on the classification of adversarial images. For Fast-Gradient-Sign perturbations of small magnitude, we found that JPG compression often reverses the drop in classification accuracy to a large extent, but not always. As the magnitude of the perturbations increases, JPG recompression alone is insufficient to reverse the effect.

A study of the effect of JPG compression on adversarial images

TL;DR

Adversarial examples threaten neural network image classifiers by introducing imperceptible perturbations that fool predictions. The study tests whether a ubiquitous image preprocess, JPEG compression, can revert FGSM-generated perturbations by effectively projecting images back into a JPG subspace, using a pre-trained OverFeat model on ImageNet. Results show JPEG recompression substantially restores correct predictions for small perturbations (ε=1) but not for larger perturbations (ε=5,10), indicating JPEG is not a robust defense. The work highlights the limits of simple preprocessing for adversarial robustness and motivates deeper investigation into subspace projections and resilient defenses in high-dimensional vision tasks.

Abstract

Neural network image classifiers are known to be vulnerable to adversarial images, i.e., natural images which have been modified by an adversarial perturbation specifically designed to be imperceptible to humans yet fool the classifier. Not only can adversarial images be generated easily, but these images will often be adversarial for networks trained on disjoint subsets of data or with different architectures. Adversarial images represent a potential security risk as well as a serious machine learning challenge---it is clear that vulnerable neural networks perceive images very differently from humans. Noting that virtually every image classification data set is composed of JPG images, we evaluate the effect of JPG compression on the classification of adversarial images. For Fast-Gradient-Sign perturbations of small magnitude, we found that JPG compression often reverses the drop in classification accuracy to a large extent, but not always. As the magnitude of the perturbations increases, JPG recompression alone is insufficient to reverse the effect.

Paper Structure

This paper contains 5 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: The red dots represent the data and the grey line the data subspace. The solid blue arrow is the adversarial perturbation that moves the data point $x$ away from the data subspace and the dotted blue arrow is the projection on the subspace. In the case where the perturbation is approximately orthogonal to the JPG subspace, JPG compression brings the adversarial example back to the data subspace.
  • Figure 2: (first) Original image $x$, with label "agama" assigned 0.99 probability; (second) Adversarial image $\mathrm{Adv}_{\epsilon}(x)$, where $\epsilon = 1$, with label "rock crab" assigned 0.93 probability and label "agama" assigned $6 \times 10^{-5}$ probability; (third and fourth) Adversarial images $\mathrm{Adv}_{\epsilon}(x)$ with $\epsilon$ set to 5 and 10. Both assign probability $\approx 0$ to "agama". However, adversarial noise becomes apparent; (last) JPG compression of the adversarial image, $\mathrm{JPG}(\mathrm{Adv}_{\epsilon}(x))$ with $\epsilon =1$, with label "agama" assigned 0.96 probability.
  • Figure 3: The top-label probabilities, i.e., the predicted probability (y-axis) assigned to the most likely class $\ell_{x}$, after various transformations $x \mapsto f(x)$. The red horizontal line in each box plots is the average top-label probability. The solid red line is the median, the box represents the interquartile range, and the whiskers represent the minimum and maximum values, excluding outliers. Labels along the bottom specify the transformation $f(x)$ applied to the image $x$ before measuring the top-label probability.
  • Figure 4: In every scatter plot, every validation image $x$ is represented by a point $(p_1,p_2)$, which specifies the top-label probabilities $p_j = p_w(\ell_{x}|f_j(x))$ under a pair $(f_1,f_2)$ of modifications of the image, respectively. All adversarial perturbations in these figures were generated with magnitude $\epsilon =1$. Along the top row, the $x$-axis represents the top-label probability for a clean image. (top left) The plot illustrates the effect of JPG compression of a natural image. The predictions do change, but on average they lie close to the diagonal and do not change the top-label probability appreciably; (top middle) If JPG compression of the adversarial image removed adversarial perturbations, we would expect this plot to look like the one to the left. While they are similar (most points lie around the diagonal), more images lie in the lower right triangle, suggesting that the adversarial perturbations are sometimes not removed or only partially removed. (top right) Adding JPG noise does not reverse the effect of adversarial perturbations: indeed, points lie closer to the lower axis than under a simple adversarial modification; (bottom left) The top-label probabilities after adversarial perturbation drops substantially on average; (bottom right) This plot complements the top-middle plot. Most of the points lie on the upper left triangle, which suggests that JPG compression of an adversarial image increases the top-label probability and partially reverses the effect of many adversarial perturbations.