Table of Contents
Fetching ...

Transferable Learned Image Compression-Resistant Adversarial Perturbations

Yang Sui, Zhuohang Li, Ding Ding, Xiang Pan, Xiaozhong Xu, Shan Liu, Zhenzhong Chen

TL;DR

The goal is to introduce the adversarial perturbation δ to the source image X that causes the reconstructed adversarial examples gs(Q(ga(X+δ))) to be misclassified by the classification model.

Abstract

Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. While existing adversarial perturbations are primarily applied to uncompressed images or compressed images by the traditional image compression method, i.e., JPEG, limited studies have investigated the robustness of models for image classification in the context of DNN-based image compression. With the rapid evolution of advanced image compression, DNN-based learned image compression has emerged as the promising approach for transmitting images in many security-critical applications, such as cloud-based face recognition and autonomous driving, due to its superior performance over traditional compression. Therefore, there is a pressing need to fully investigate the robustness of a classification system post-processed by learned image compression. To bridge this research gap, we explore the adversarial attack on a new pipeline that targets image classification models that utilize learned image compressors as pre-processing modules. Furthermore, to enhance the transferability of perturbations across various quality levels and architectures of learned image compression models, we introduce a saliency score-based sampling method to enable the fast generation of transferable perturbation. Extensive experiments with popular attack methods demonstrate the enhanced transferability of our proposed method when attacking images that have been post-processed with different learned image compression models.

Transferable Learned Image Compression-Resistant Adversarial Perturbations

TL;DR

The goal is to introduce the adversarial perturbation δ to the source image X that causes the reconstructed adversarial examples gs(Q(ga(X+δ))) to be misclassified by the classification model.

Abstract

Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. While existing adversarial perturbations are primarily applied to uncompressed images or compressed images by the traditional image compression method, i.e., JPEG, limited studies have investigated the robustness of models for image classification in the context of DNN-based image compression. With the rapid evolution of advanced image compression, DNN-based learned image compression has emerged as the promising approach for transmitting images in many security-critical applications, such as cloud-based face recognition and autonomous driving, due to its superior performance over traditional compression. Therefore, there is a pressing need to fully investigate the robustness of a classification system post-processed by learned image compression. To bridge this research gap, we explore the adversarial attack on a new pipeline that targets image classification models that utilize learned image compressors as pre-processing modules. Furthermore, to enhance the transferability of perturbations across various quality levels and architectures of learned image compression models, we introduce a saliency score-based sampling method to enable the fast generation of transferable perturbation. Extensive experiments with popular attack methods demonstrate the enhanced transferability of our proposed method when attacking images that have been post-processed with different learned image compression models.
Paper Structure (10 sections, 5 equations, 5 figures, 3 tables)

This paper contains 10 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Our proposed adversarial attack pipeline against the LICCS. Left: The framework of the LIC (described in Section \ref{['method:LIC']}); Middle: LICCS pipeline. Both the original image and the reconstructed image are classified with the correct label "Cat"; Right: After the adversarial attacks, although the adversarial examples are classified with the correct label, its reconstructed image through the LIC is misrecognized as the wrong label "Dog".
  • Figure 2: Top-1 accuracy of PGD black-box attack results of cheng2020cheng2020learned model. Each row/column corresponds to a surrogate/target model with a given quality level.
  • Figure 3: ASR of PGD black-box attack results on quality level 1 to 6 of cheng2020cheng2020learned model with $\epsilon=16$, $\alpha=2$, $iters=20$.
  • Figure 4: Top-1 accuracy of LICCS with the surrogate model cheng2020 and target model hyper attacked by PGD. Lower accuracy demonstrates higher transferability.
  • Figure 5: Top-1 accuracy of LICCS after PGD black-box attack of cheng2020cheng2020learned model. Each row/column corresponds to a surrogate/target model with a given quality level.