Countering Adversarial Images using Input Transformations
Chuan Guo, Mayank Rana, Moustapha Cisse, Laurens van der Maaten
TL;DR
The paper investigates model-agnostic input-transform defenses against adversarial image perturbations on ImageNet, focusing on cropping-rescaling, bit-depth reduction, JPEG compression, total variation minimization (TVM), and image quilting. TVM and image quilting emerge as the strongest defenses due to non-differentiability and randomness, especially when networks are trained on transformed data. In gray-box and black-box settings, these transformations reduce attack success substantially, with quilting defending up to 80–90% of attacks in some cases and overall robustness improved via ensembling and model transfer. The work highlights the importance of randomness and non-differentiability in defenses and suggests combining input transformations with other strategies for enhanced adversarial robustness across domains.
Abstract
This paper investigates strategies that defend against adversarial-example attacks on image-classification systems by transforming the inputs before feeding them to the system. Specifically, we study applying image transformations such as bit-depth reduction, JPEG compression, total variance minimization, and image quilting before feeding the image to a convolutional network classifier. Our experiments on ImageNet show that total variance minimization and image quilting are very effective defenses in practice, in particular, when the network is trained on transformed images. The strength of those defenses lies in their non-differentiable nature and their inherent randomness, which makes it difficult for an adversary to circumvent the defenses. Our best defense eliminates 60% of strong gray-box and 90% of strong black-box attacks by a variety of major attack methods
