Table of Contents
Fetching ...

Leveraging Generalizability of Image-to-Image Translation for Enhanced Adversarial Defense

Haibo Zhang, Zhihua Yao, Kouichi Sakurai, Takeshi Saitoh

TL;DR

This paper addresses adversarial vulnerabilities in deep neural networks by proposing a generalizable defense based on image-to-image translation enhanced with residual blocks. The method uses a conditional GAN with a U-Net–style generator and a PatchGAN discriminator, optimized by a combined loss $\mathcal{L}_{cGAN} + \lambda_1 \mathcal{L}_{L1} + \lambda_2 \mathcal{L}_{perc}$, and is evaluated on MNIST, Fashion-MNIST, CIFAR-10, and ImageNet across multiple attacks. It demonstrates that training on a composite set of attacks yields strong cross-attack robustness and transferability across target models, with high restoration accuracy and competitive image quality (PSNR) while maintaining efficiency. Ablation studies show seven residual blocks offer the best trade-off between performance and training time. The work suggests practical, scalable defenses against evolving adversarial threats, with potential deployment in safety-critical applications and future work exploring printed adversarial examples.

Abstract

In the rapidly evolving field of artificial intelligence, machine learning emerges as a key technology characterized by its vast potential and inherent risks. The stability and reliability of these models are important, as they are frequent targets of security threats. Adversarial attacks, first rigorously defined by Ian Goodfellow et al. in 2013, highlight a critical vulnerability: they can trick machine learning models into making incorrect predictions by applying nearly invisible perturbations to images. Although many studies have focused on constructing sophisticated defensive mechanisms to mitigate such attacks, they often overlook the substantial time and computational costs of training and maintaining these models. Ideally, a defense method should be able to generalize across various, even unseen, adversarial attacks with minimal overhead. Building on our previous work on image-to-image translation-based defenses, this study introduces an improved model that incorporates residual blocks to enhance generalizability. The proposed method requires training only a single model, effectively defends against diverse attack types, and is well-transferable between different target models. Experiments show that our model can restore the classification accuracy from near zero to an average of 72\% while maintaining competitive performance compared to state-of-the-art methods.

Leveraging Generalizability of Image-to-Image Translation for Enhanced Adversarial Defense

TL;DR

This paper addresses adversarial vulnerabilities in deep neural networks by proposing a generalizable defense based on image-to-image translation enhanced with residual blocks. The method uses a conditional GAN with a U-Net–style generator and a PatchGAN discriminator, optimized by a combined loss , and is evaluated on MNIST, Fashion-MNIST, CIFAR-10, and ImageNet across multiple attacks. It demonstrates that training on a composite set of attacks yields strong cross-attack robustness and transferability across target models, with high restoration accuracy and competitive image quality (PSNR) while maintaining efficiency. Ablation studies show seven residual blocks offer the best trade-off between performance and training time. The work suggests practical, scalable defenses against evolving adversarial threats, with potential deployment in safety-critical applications and future work exploring printed adversarial examples.

Abstract

In the rapidly evolving field of artificial intelligence, machine learning emerges as a key technology characterized by its vast potential and inherent risks. The stability and reliability of these models are important, as they are frequent targets of security threats. Adversarial attacks, first rigorously defined by Ian Goodfellow et al. in 2013, highlight a critical vulnerability: they can trick machine learning models into making incorrect predictions by applying nearly invisible perturbations to images. Although many studies have focused on constructing sophisticated defensive mechanisms to mitigate such attacks, they often overlook the substantial time and computational costs of training and maintaining these models. Ideally, a defense method should be able to generalize across various, even unseen, adversarial attacks with minimal overhead. Building on our previous work on image-to-image translation-based defenses, this study introduces an improved model that incorporates residual blocks to enhance generalizability. The proposed method requires training only a single model, effectively defends against diverse attack types, and is well-transferable between different target models. Experiments show that our model can restore the classification accuracy from near zero to an average of 72\% while maintaining competitive performance compared to state-of-the-art methods.

Paper Structure

This paper contains 31 sections, 12 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Adversarial examples of FGSM attack, PGD attack, C&W attack and AutoAttack.
  • Figure 2: An adversarial example under the FGSM attack with $\epsilon = 10/255$.
  • Figure 3: Comprehensive architecture of the proposed image reconstruction method for defending against adversarial attacks.
  • Figure 4: The computation of the PSNR and MAE values for both the images subjected to six types of adversarial attacks and those reconstructed by the universal defense model when compared to the original images.
  • Figure 5: Robustness Check using the PGD attack and the MI-FGSM attack. To simulate different attack strengths, we gradually change the iteration number from 10 to 100, and the $\epsilon$ includes 2/255, 5/255, and 10/255.
  • ...and 2 more figures