Table of Contents
Fetching ...

Test-time Adversarial Defense with Opposite Adversarial Path and High Attack Time Cost

Cheng-Han Yeh, Kuanchun Yu, Chun-Shien Lu

TL;DR

The paper addresses the vulnerability of deep models to adversarial perturbations by proposing a test-time defense that purifies inputs via Opposite Adversarial Paths (OAP) and diffusion-based purification. By generating new reference points along the opposite perturbation and integrating OAP with reverse diffusion, the method aims to push inputs away from decision boundaries and increase resistance to adaptive attacks. It introduces a diffusion-path framework with dual paths, analyzes attack cost and time complexity, and demonstrates that this approach improves both clean and robust accuracy while raising the computational burden for attackers, revealing pitfalls in evaluating diffusion-based defenses with AutoAttack. The proposed framework offers a modular, plug-in defense that can augment existing purifiers and diffusion schemes, with practical impact on strengthening test-time robustness in real-world deployments while inviting further study on attack methodologies and evaluation rigor.

Abstract

Deep learning models are known to be vulnerable to adversarial attacks by injecting sophisticated designed perturbations to input data. Training-time defenses still exhibit a significant performance gap between natural accuracy and robust accuracy. In this paper, we investigate a new test-time adversarial defense method via diffusion-based recovery along opposite adversarial paths (OAPs). We present a purifier that can be plugged into a pre-trained model to resist adversarial attacks. Different from prior arts, the key idea is excessive denoising or purification by integrating the opposite adversarial direction with reverse diffusion to push the input image further toward the opposite adversarial direction. For the first time, we also exemplify the pitfall of conducting AutoAttack (Rand) for diffusion-based defense methods. Through the lens of time complexity, we examine the trade-off between the effectiveness of adaptive attack and its computation complexity against our defense. Experimental evaluation along with time cost analysis verifies the effectiveness of the proposed method.

Test-time Adversarial Defense with Opposite Adversarial Path and High Attack Time Cost

TL;DR

The paper addresses the vulnerability of deep models to adversarial perturbations by proposing a test-time defense that purifies inputs via Opposite Adversarial Paths (OAP) and diffusion-based purification. By generating new reference points along the opposite perturbation and integrating OAP with reverse diffusion, the method aims to push inputs away from decision boundaries and increase resistance to adaptive attacks. It introduces a diffusion-path framework with dual paths, analyzes attack cost and time complexity, and demonstrates that this approach improves both clean and robust accuracy while raising the computational burden for attackers, revealing pitfalls in evaluating diffusion-based defenses with AutoAttack. The proposed framework offers a modular, plug-in defense that can augment existing purifiers and diffusion schemes, with practical impact on strengthening test-time robustness in real-world deployments while inviting further study on attack methodologies and evaluation rigor.

Abstract

Deep learning models are known to be vulnerable to adversarial attacks by injecting sophisticated designed perturbations to input data. Training-time defenses still exhibit a significant performance gap between natural accuracy and robust accuracy. In this paper, we investigate a new test-time adversarial defense method via diffusion-based recovery along opposite adversarial paths (OAPs). We present a purifier that can be plugged into a pre-trained model to resist adversarial attacks. Different from prior arts, the key idea is excessive denoising or purification by integrating the opposite adversarial direction with reverse diffusion to push the input image further toward the opposite adversarial direction. For the first time, we also exemplify the pitfall of conducting AutoAttack (Rand) for diffusion-based defense methods. Through the lens of time complexity, we examine the trade-off between the effectiveness of adaptive attack and its computation complexity against our defense. Experimental evaluation along with time cost analysis verifies the effectiveness of the proposed method.

Paper Structure

This paper contains 31 sections, 16 equations, 5 figures, 10 tables, 1 algorithm.

Figures (5)

  • Figure 1: Concept diagram of new reference point generation via $K$ consecutive purifications along opposite adversarial paths (OAPs).
  • Figure 2: Flowchart of our method. The purifier (gray block) can be one of (a)-(c), where (a) is the proposed baseline purifier, (b) shows the combination of baseline purifier and reverse diffusion, and (c) expands (b) with two diffusion paths. In (c), $x_1^{tar}, \ldots, x_C^{tar}$ are obtained via Eq. (\ref{['eq: data generation rule']}) from fixed $C$ images with one image per class. The image in front of Color OT with green/blue arrow is called the source/target image. $x^{p_2}$ is defined in Eq. (\ref{['eq: color ot']}).
  • Figure 3: (Ideal model) Red arrows depict directions to minimize $\ell_2$ distance between the intermediate images of two reverse paths, $p_1$ and $p_2$. $\mathcal{L}$: loss function.
  • Figure 4: Reverse diffusion process implementations: The original implementation of DiffPure involves only one function call in reverse and adjoint solver calls. The PGD+EOT attack utilizes a surrogate diffusion process with fewer steps than purification steps. However, in our implementation, we use the same number of steps for purification and attack.
  • Figure 5: Intermediate Images generated from Fig. \ref{['defense flow chart']}(c). From Top to Bottom: The images denote clean image $x$, $x^{p_2}$, $x^{p_1}_{t^*}$, $\widehat{x^{p_1}}$, $x^{p_2}_{t^*}$, $\widehat{x^{p_2}}$, and purified image $\widehat{x_{clean}}$, respectively.