Table of Contents
Fetching ...

Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness

Hefei Mei, Minjing Dong, Chang Xu

TL;DR

This work reframes diffusion-model-based defenses from image generation to image-label translation, addressing the heavy computational burden that limits practical deployment. By constructing orthogonal pixel-space image labels and training a pruned U-Net with reduced diffusion steps, the authors introduce an Image-to-Image Diffusion Classifier (IDC) that performs classification by translating inputs toward predefined labels and measuring distances. The model integrates an intra-class ELBO-like objective and a novel inter-class loss to boost class separability, achieving strong adversarial robustness with far fewer parameters and lower FLOPs than prior DM-based defenses while remaining competitive with CNN-based defenses. The proposed approach enables full-dataset evaluation and practical deployment, with code released for reproducibility and further research.

Abstract

Diffusion models (DMs) have demonstrated great potential in the field of adversarial robustness, where DM-based defense methods can achieve superior defense capability without adversarial training. However, they all require huge computational costs due to the usage of large-scale pre-trained DMs, making it difficult to conduct full evaluation under strong attacks and compare with traditional CNN-based methods. Simply reducing the network size and timesteps in DMs could significantly harm the image generation quality, which invalidates previous frameworks. To alleviate this issue, we redesign the diffusion framework from generating high-quality images to predicting distinguishable image labels. Specifically, we employ an image translation framework to learn many-to-one mapping from input samples to designed orthogonal image labels. Based on this framework, we introduce an efficient Image-to-Image diffusion classifier with a pruned U-Net structure and reduced diffusion timesteps. Besides the framework, we redesign the optimization objective of DMs to fit the target of image classification, where a new classification loss is incorporated in the DM-based image translation framework to distinguish the generated label from those of other classes. We conduct sufficient evaluations of the proposed classifier under various attacks on popular benchmarks. Extensive experiments show that our method achieves better adversarial robustness with fewer computational costs than DM-based and CNN-based methods. The code is available at https://github.com/hfmei/IDC

Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness

TL;DR

This work reframes diffusion-model-based defenses from image generation to image-label translation, addressing the heavy computational burden that limits practical deployment. By constructing orthogonal pixel-space image labels and training a pruned U-Net with reduced diffusion steps, the authors introduce an Image-to-Image Diffusion Classifier (IDC) that performs classification by translating inputs toward predefined labels and measuring distances. The model integrates an intra-class ELBO-like objective and a novel inter-class loss to boost class separability, achieving strong adversarial robustness with far fewer parameters and lower FLOPs than prior DM-based defenses while remaining competitive with CNN-based defenses. The proposed approach enables full-dataset evaluation and practical deployment, with code released for reproducibility and further research.

Abstract

Diffusion models (DMs) have demonstrated great potential in the field of adversarial robustness, where DM-based defense methods can achieve superior defense capability without adversarial training. However, they all require huge computational costs due to the usage of large-scale pre-trained DMs, making it difficult to conduct full evaluation under strong attacks and compare with traditional CNN-based methods. Simply reducing the network size and timesteps in DMs could significantly harm the image generation quality, which invalidates previous frameworks. To alleviate this issue, we redesign the diffusion framework from generating high-quality images to predicting distinguishable image labels. Specifically, we employ an image translation framework to learn many-to-one mapping from input samples to designed orthogonal image labels. Based on this framework, we introduce an efficient Image-to-Image diffusion classifier with a pruned U-Net structure and reduced diffusion timesteps. Besides the framework, we redesign the optimization objective of DMs to fit the target of image classification, where a new classification loss is incorporated in the DM-based image translation framework to distinguish the generated label from those of other classes. We conduct sufficient evaluations of the proposed classifier under various attacks on popular benchmarks. Extensive experiments show that our method achieves better adversarial robustness with fewer computational costs than DM-based and CNN-based methods. The code is available at https://github.com/hfmei/IDC
Paper Structure (46 sections, 1 theorem, 37 equations, 6 figures, 7 tables)

This paper contains 46 sections, 1 theorem, 37 equations, 6 figures, 7 tables.

Key Result

Proposition 1

For the objective in the Eq. (ELBO_inter), the training loss could be simplified as:

Figures (6)

  • Figure 1: Comparison of robust accuracy with both CNN-based and DM-based benchmarks against BPDA+EOT attack. The area of a circle demonstrates the inference FLOPs where the FLOPs of our introduced IDC are 27.09G and those of DiffPure are 14.26T. The dark center of the circle is located at the accurate value.
  • Figure 2: Comparison of different diffusion classifier paradigms. The number of iterations $K>T_s$ in our IDC while the timesteps $t_L \gg T_s$. The parameters of $U_L$ in both ZDC and DiffPure are also larger than $U_s$ in IDC.
  • Figure 3: The illustration of our framework and optimization loss. Triangles represent input samples, while circles represent generated samples of the network. In Figure (a), we represent orthogonal labels in a high-dimensional pixel space using a three-dimensional schematic. The differently colored pentagrams each correspond to image labels of distinct categories.
  • Figure 4: The evaluation of robustness under different PGD attack settings and different diffusion timesteps. (a) denotes larger attack iterations. (b) denotes larger perturbation sizes. The symbol * denotes the adversarial robustness of IDC without inter-class loss. (c) Standard and robust accuracy under PGD attack and AutoAttack $\ell_{\infty}$ ($\epsilon=8/255$) with different timesteps.
  • Figure 5: The illustration of the pruning network architecture of U-Net. We take the contracting path (left side) as a schematic for the pruning process while the structure of the expansive path (right side) is adapted to match the adjusted contracting path.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Proposition 1