Table of Contents
Fetching ...

MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks

Xinxin Liu, Zhongliang Guo, Siyuan Huang, Chun Pong Lau

TL;DR

This work introduces an innovative framework that leverages the distilled backbone of diffusion models and incorporates a precision-optimized noise predictor to enhance the effectiveness of the attack framework, and achieves outstanding transferability and robustness against purification defenses, outperforming existing gradient-based attack models in both effectiveness and efficiency.

Abstract

Neural networks have achieved remarkable performance across a wide range of tasks, yet they remain susceptible to adversarial perturbations, which pose significant risks in safety-critical applications. With the rise of multimodality, diffusion models have emerged as powerful tools not only for generative tasks but also for various applications such as image editing, inpainting, and super-resolution. However, these models still lack robustness due to limited research on attacking them to enhance their resilience. Traditional attack techniques, such as gradient-based adversarial attacks and diffusion model-based methods, are hindered by computational inefficiencies and scalability issues due to their iterative nature. To address these challenges, we introduce an innovative framework that leverages the distilled backbone of diffusion models and incorporates a precision-optimized noise predictor to enhance the effectiveness of our attack framework. This approach not only enhances the attack's potency but also significantly reduces computational costs. Our framework provides a cutting-edge solution for multi-modal adversarial attacks, ensuring reduced latency and the generation of high-fidelity adversarial examples with superior success rates. Furthermore, we demonstrate that our framework achieves outstanding transferability and robustness against purification defenses, outperforming existing gradient-based attack models in both effectiveness and efficiency.

MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks

TL;DR

This work introduces an innovative framework that leverages the distilled backbone of diffusion models and incorporates a precision-optimized noise predictor to enhance the effectiveness of the attack framework, and achieves outstanding transferability and robustness against purification defenses, outperforming existing gradient-based attack models in both effectiveness and efficiency.

Abstract

Neural networks have achieved remarkable performance across a wide range of tasks, yet they remain susceptible to adversarial perturbations, which pose significant risks in safety-critical applications. With the rise of multimodality, diffusion models have emerged as powerful tools not only for generative tasks but also for various applications such as image editing, inpainting, and super-resolution. However, these models still lack robustness due to limited research on attacking them to enhance their resilience. Traditional attack techniques, such as gradient-based adversarial attacks and diffusion model-based methods, are hindered by computational inefficiencies and scalability issues due to their iterative nature. To address these challenges, we introduce an innovative framework that leverages the distilled backbone of diffusion models and incorporates a precision-optimized noise predictor to enhance the effectiveness of our attack framework. This approach not only enhances the attack's potency but also significantly reduces computational costs. Our framework provides a cutting-edge solution for multi-modal adversarial attacks, ensuring reduced latency and the generation of high-fidelity adversarial examples with superior success rates. Furthermore, we demonstrate that our framework achieves outstanding transferability and robustness against purification defenses, outperforming existing gradient-based attack models in both effectiveness and efficiency.

Paper Structure

This paper contains 25 sections, 20 equations, 7 figures, 6 tables, 2 algorithms.

Figures (7)

  • Figure 1: Examples of MMAD-Purify. The input image $\mathbf{x}$ is attacked by MMAD-Purify to create $\mathbf{x}_{\text{adv}}$, which is then purified to obtain $\mathbf{x}_{\text{adv}}^p$. The final sample demonstrates high image quality and robustness against defenses.
  • Figure 2: Overview of MMAD-Purify. In the MMAD-Purify framework, the input image is first processed through an encoder. It then enters the first distillated DM pipeline, where a precision-optimized noise predictor is applied. The resulting latents of the input image, combined with latents of other modalities, form a multi-modal representation. This multi-modal representation is then passed through a target classifier to generate perturbations, which are added back to the input image. This process is iteratively repeated, ultimately generating the adversarial example, $\mathbf{x}_{adv}$. After purification, the final $\mathbf{x}_{adv}^p$ has a different label from the generated image, demonstrating successful adversarial attack.
  • Figure 3: An example to visualize the difference of ResNet-50 generated adversarial samples between 2 different defense techniques across 3 methods. By zooming in on the left, the adversarial perturbation generated by our method is more imperceptible, and even when the defense method is a black box ($p_1$), the anti-purification capability is still strong.
  • Figure 4: Ablation study on $\epsilon$, $I$, $L$, and $\tau$. Avg ASR is the mean performance across six neural networks. Avg IQA is calculated using min-max normalization of PSNR, SSIM, FID, and LPIPS, with 1-minmax applied to FID and LPIPS for consistency in quality representation.
  • Figure 5: Visualization Comparison of MMA, MMAD, and MMAD-Purify: From left to right: original image $\mathbf{x}$, adversarial sample of MMA, adversarial sample of MMAD, and adversarial sample of MMAD-Purify.The green box highlights details of adversarial samples from different methods. We observe that demonstrates the highest fidelity to the original image compared to the other two attack methods, highlighting the effectiveness of our MMAD-Purify approach in preserving image quality while maintaining adversarial properties. In our experiments, we set the number of attack steps $I = 10$, attack budget $\epsilon = 16/255$, number of timesteps $\tau = 4$ for MMAD and MMAD-Purify, and $\tau = 50$ for MMA. The noising strengths for MMA, MMAD, and MMAD-Purify are $3/50$, $0.6$, and $0.05$, respectively, with the number of precision-optimized steps $L = 10$. In our initial experiments, we found that both the Image Quality Assessment (IQA) and Attack Success Rate (ASR) of MMAD were not ideal, which motivated the development of MMAD-Purify.
  • ...and 2 more figures