A study of the effect of JPG compression on adversarial images
Gintare Karolina Dziugaite, Zoubin Ghahramani, Daniel M. Roy
TL;DR
Adversarial examples threaten neural network image classifiers by introducing imperceptible perturbations that fool predictions. The study tests whether a ubiquitous image preprocess, JPEG compression, can revert FGSM-generated perturbations by effectively projecting images back into a JPG subspace, using a pre-trained OverFeat model on ImageNet. Results show JPEG recompression substantially restores correct predictions for small perturbations (ε=1) but not for larger perturbations (ε=5,10), indicating JPEG is not a robust defense. The work highlights the limits of simple preprocessing for adversarial robustness and motivates deeper investigation into subspace projections and resilient defenses in high-dimensional vision tasks.
Abstract
Neural network image classifiers are known to be vulnerable to adversarial images, i.e., natural images which have been modified by an adversarial perturbation specifically designed to be imperceptible to humans yet fool the classifier. Not only can adversarial images be generated easily, but these images will often be adversarial for networks trained on disjoint subsets of data or with different architectures. Adversarial images represent a potential security risk as well as a serious machine learning challenge---it is clear that vulnerable neural networks perceive images very differently from humans. Noting that virtually every image classification data set is composed of JPG images, we evaluate the effect of JPG compression on the classification of adversarial images. For Fast-Gradient-Sign perturbations of small magnitude, we found that JPG compression often reverses the drop in classification accuracy to a large extent, but not always. As the magnitude of the perturbations increases, JPG recompression alone is insufficient to reverse the effect.
