Table of Contents
Fetching ...

PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies

Mojtaba Nafez, Amirhossein Koochakian, Arad Maleki, Jafar Habibi, Mohammad Hossein Rohban

TL;DR

PatchGuard tackles the susceptibility of anomaly detection and localization to adversarial perturbations by generating foreground-aware pseudo-anomalies from normal data and training a Vision Transformer with an attention-regularized loss. The method combines Grad-CAM guided pseudo-anomaly generation, a patch-wise Attention Discriminator, and a loss that increases the ViT’s last-layer attention degree, supported by theoretical analyses. Empirically, PatchGuard achieves substantial robustness gains under PGD-1000 attacks (up to 53.2% in AD and 68.5% in AL) across eight industrial and medical datasets while preserving competitive performance in clean settings, and it outperforms adapted SOTA methods in adversarial scenarios. These results suggest a practical pathway toward reliable, pixel-precise anomaly detection and localization in high-resolution applications, with code available at the authors’ repository.

Abstract

Anomaly Detection (AD) and Anomaly Localization (AL) are crucial in fields that demand high reliability, such as medical imaging and industrial monitoring. However, current AD and AL approaches are often susceptible to adversarial attacks due to limitations in training data, which typically include only normal, unlabeled samples. This study introduces PatchGuard, an adversarially robust AD and AL method that incorporates pseudo anomalies with localization masks within a Vision Transformer (ViT)-based architecture to address these vulnerabilities. We begin by examining the essential properties of pseudo anomalies, and follow it by providing theoretical insights into the attention mechanisms required to enhance the adversarial robustness of AD and AL systems. We then present our approach, which leverages Foreground-Aware Pseudo-Anomalies to overcome the deficiencies of previous anomaly-aware methods. Our method incorporates these crafted pseudo-anomaly samples into a ViT-based framework, with adversarial training guided by a novel loss function designed to improve model robustness, as supported by our theoretical analysis. Experimental results on well-established industrial and medical datasets demonstrate that PatchGuard significantly outperforms previous methods in adversarial settings, achieving performance gains of $53.2\%$ in AD and $68.5\%$ in AL, while also maintaining competitive accuracy in non-adversarial settings. The code repository is available at https://github.com/rohban-lab/PatchGuard .

PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies

TL;DR

PatchGuard tackles the susceptibility of anomaly detection and localization to adversarial perturbations by generating foreground-aware pseudo-anomalies from normal data and training a Vision Transformer with an attention-regularized loss. The method combines Grad-CAM guided pseudo-anomaly generation, a patch-wise Attention Discriminator, and a loss that increases the ViT’s last-layer attention degree, supported by theoretical analyses. Empirically, PatchGuard achieves substantial robustness gains under PGD-1000 attacks (up to 53.2% in AD and 68.5% in AL) across eight industrial and medical datasets while preserving competitive performance in clean settings, and it outperforms adapted SOTA methods in adversarial scenarios. These results suggest a practical pathway toward reliable, pixel-precise anomaly detection and localization in high-resolution applications, with code available at the authors’ repository.

Abstract

Anomaly Detection (AD) and Anomaly Localization (AL) are crucial in fields that demand high reliability, such as medical imaging and industrial monitoring. However, current AD and AL approaches are often susceptible to adversarial attacks due to limitations in training data, which typically include only normal, unlabeled samples. This study introduces PatchGuard, an adversarially robust AD and AL method that incorporates pseudo anomalies with localization masks within a Vision Transformer (ViT)-based architecture to address these vulnerabilities. We begin by examining the essential properties of pseudo anomalies, and follow it by providing theoretical insights into the attention mechanisms required to enhance the adversarial robustness of AD and AL systems. We then present our approach, which leverages Foreground-Aware Pseudo-Anomalies to overcome the deficiencies of previous anomaly-aware methods. Our method incorporates these crafted pseudo-anomaly samples into a ViT-based framework, with adversarial training guided by a novel loss function designed to improve model robustness, as supported by our theoretical analysis. Experimental results on well-established industrial and medical datasets demonstrate that PatchGuard significantly outperforms previous methods in adversarial settings, achieving performance gains of in AD and in AL, while also maintaining competitive accuracy in non-adversarial settings. The code repository is available at https://github.com/rohban-lab/PatchGuard .

Paper Structure

This paper contains 32 sections, 9 equations, 4 figures, 21 tables.

Figures (4)

  • Figure 1: Impact of Adversarial Attacks on Anomaly Localization Methods: Localization maps for multiple methods are shown before and after a PGD-1000 attack, illustrating the vulnerability of existing methods to adversarial attacks, even when performing perfectly in clean conditions. Our proposed method demonstrates enhanced robustness in these adversarial scenarios.
  • Figure 2: The figure demonstrates how increasing the average last-layer attention degree in the ViT-base architecture reduces vulnerability to adversarial attacks. Specifically, the BraTS dataset images were clustered into five groups based on their attention degree values (x-axis). The y-axis represents Vulnerability, measured as the absolute difference in the model's localization performance (AUROC%) for each cluster before and after a PGD attack. (a) illustrates the decreasing trend in a clean-trained model. (b) shows a similar decreasing trend in the adversarially trained model, where the attention degree is relatively higher than in the clean model.
  • Figure 3: Overview of the PatchGuard framework. (1) We use Grad-CAM to identify regions likely to belong to the foreground in an image. By applying $k_{\text{soft}}=3$ augmentations to the image $x$, we generate a combined saliency map $G(x)$. (2) The input $x$ and its saliency map $G(x)$ are then passed to the anomaly generator to produce an anomaly sample $x'$ along with its ground-truth mask $x'_m$. (3) For each normal sample $x_i$, the input batch includes the image, its corresponding anomaly version $x'_i$, and their adversarially attacked variants, along with their ground-truth masks. Each batch sample is processed through a Vision Transformer (ViT), where the Attention Discriminator assigns an anomaly score $p_i$ to each patch embedding. The final anomaly map is constructed from these scores and trained with a novel loss $\mathcal{L}$ to replicate the ground truth.
  • Figure 4: Visualization of Pseudo-Anomaly Generated for Each Dataset. Each group corresponds to one dataset: MVTec AD, VisA, BTAD, MPDD, WFDD, DTD-Synthetic, BraTS2021, and Head-CT. Within each group, columns represent randomly selected samples from the respective dataset. The first row shows a normal image, the second row depicts the corresponding pseudo-anomaly generated image, and the third row illustrates the associated anomaly mask.