Table of Contents
Fetching ...

Industrial Anomaly Detection and Localization Using Weakly-Supervised Residual Transformers

Hanxi Li, Jingqi Wu, Deyin Liu, Lin Wu, Hao Chen, Mingwen Wang, Chunhua Shen

TL;DR

WeakREST tackles industrial anomaly detection with scarce anomalous data by reframing pixel-level localization as block-wise classification and introducing PosFAR residuals to feed a residual-aware Swin Transformer. The framework also leverages weak annotations (bounding boxes and image tags) via ResMixMatch to exploit unlabeled regions, dramatically reducing labeling effort while sustaining high accuracy. Across MVTec-AD, MVTec-3D, and KolektorSDD2, WeakREST achieves state-of-the-art AP in unsupervised and supervised settings and even surpasses pixel-perfect supervision when using weak labels. The approach combines efficient residual-based feature representations, a tailored transformer backbone, and semi-supervised learning to deliver practical, annotation-efficient industrial AD with strong localization performance.

Abstract

Recent advancements in industrial anomaly detection (AD) have demonstrated that incorporating a small number of anomalous samples during training can significantly enhance accuracy. However, this improvement often comes at the cost of extensive annotation efforts, which are impractical for many real-world applications. In this paper, we introduce a novel framework, Weak}ly-supervised RESidual Transformer (WeakREST), designed to achieve high anomaly detection accuracy while minimizing the reliance on manual annotations. First, we reformulate the pixel-wise anomaly localization task into a block-wise classification problem. Second, we introduce a residual-based feature representation called Positional Fast Anomaly Residuals (PosFAR) which captures anomalous patterns more effectively. To leverage this feature, we adapt the Swin Transformer for enhanced anomaly detection and localization. Additionally, we propose a weak annotation approach, utilizing bounding boxes and image tags to define anomalous regions. This approach establishes a semi-supervised learning context that reduces the dependency on precise pixel-level labels. To further improve the learning process, we develop a novel ResMixMatch algorithm, capable of handling the interplay between weak labels and residual-based representations. On the benchmark dataset MVTec-AD, our method achieves an Average Precision (AP) of $83.0\%$, surpassing the previous best result of $82.7\%$ in the unsupervised setting. In the supervised AD setting, WeakREST attains an AP of $87.6\%$, outperforming the previous best of $86.0\%$. Notably, even when using weaker annotations such as bounding boxes, WeakREST exceeds the performance of leading methods relying on pixel-wise supervision, achieving an AP of $87.1\%$ compared to the prior best of $86.0\%$ on MVTec-AD.

Industrial Anomaly Detection and Localization Using Weakly-Supervised Residual Transformers

TL;DR

WeakREST tackles industrial anomaly detection with scarce anomalous data by reframing pixel-level localization as block-wise classification and introducing PosFAR residuals to feed a residual-aware Swin Transformer. The framework also leverages weak annotations (bounding boxes and image tags) via ResMixMatch to exploit unlabeled regions, dramatically reducing labeling effort while sustaining high accuracy. Across MVTec-AD, MVTec-3D, and KolektorSDD2, WeakREST achieves state-of-the-art AP in unsupervised and supervised settings and even surpasses pixel-perfect supervision when using weak labels. The approach combines efficient residual-based feature representations, a tailored transformer backbone, and semi-supervised learning to deliver practical, annotation-efficient industrial AD with strong localization performance.

Abstract

Recent advancements in industrial anomaly detection (AD) have demonstrated that incorporating a small number of anomalous samples during training can significantly enhance accuracy. However, this improvement often comes at the cost of extensive annotation efforts, which are impractical for many real-world applications. In this paper, we introduce a novel framework, Weak}ly-supervised RESidual Transformer (WeakREST), designed to achieve high anomaly detection accuracy while minimizing the reliance on manual annotations. First, we reformulate the pixel-wise anomaly localization task into a block-wise classification problem. Second, we introduce a residual-based feature representation called Positional Fast Anomaly Residuals (PosFAR) which captures anomalous patterns more effectively. To leverage this feature, we adapt the Swin Transformer for enhanced anomaly detection and localization. Additionally, we propose a weak annotation approach, utilizing bounding boxes and image tags to define anomalous regions. This approach establishes a semi-supervised learning context that reduces the dependency on precise pixel-level labels. To further improve the learning process, we develop a novel ResMixMatch algorithm, capable of handling the interplay between weak labels and residual-based representations. On the benchmark dataset MVTec-AD, our method achieves an Average Precision (AP) of , surpassing the previous best result of in the unsupervised setting. In the supervised AD setting, WeakREST attains an AP of , outperforming the previous best of . Notably, even when using weaker annotations such as bounding boxes, WeakREST exceeds the performance of leading methods relying on pixel-wise supervision, achieving an AP of compared to the prior best of on MVTec-AD.
Paper Structure (34 sections, 16 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 34 sections, 16 equations, 7 figures, 9 tables, 1 algorithm.

Figures (7)

  • Figure 1: The comparison between the proposed weak annotation strategy and the conventional paradigm. Unlike traditional pixel-wise labels (see the top red box), our proposed annotations are categorized into three levels. Row-$1$: The AD problem is reformulated as block-wise binary classification. Normal samples, anomalous samples, and ignored samples are represented in blue, orange, and gray, respectively. This approach significantly reduces annotation granularity. Row-$2$: A weaker labeling strategy using bounding boxes that encompass entire anomalous regions. This eliminates the need for pixel-level detail while still preserving key information about the defect. Row-$3$: The weakest label using only tags indicating the defective status of the image. The numbers in the parenthesis denote the order of magnitudes (from $10^{4}$ to $1$) of the annotation clicks under the three levels of weak annotations. Best viewed in color.
  • Figure 2: The overview of the INFERENCE process of WeakREST, which consists of three modules: feature extraction (see Sec. \ref{['subsec:PosFAR']}), PosFAR residual generator (see Sec. \ref{['subsec:PosFAR']}) and Swin Transformer module for block-wise anomaly classification (see Sec. \ref{['subsec:swin']}). In this residual-based AD algorithm, the query information (from the test image) and reference information (from the training images) are utilized cooperatively to achieve high accuracy of anomaly detection and localization.
  • Figure 3: The block labeling strategy. The blocks with more than $\epsilon^{+}\rho^2$ anomaly pixels are labeled $1$ (red) while those blocks with less than $\epsilon^{-}\rho^2$ are labeled $-1$ (blue). The remaining blocks are labeled $\emptyset$ and will be ignored in the training phase.
  • Figure 4: Three types of weak labels considered in this paper. From left to right: the"rotated bounding-boxes" (left), the "axis-aligned bounding-boxes" (middle) and the "image-level labels" (right). The lower part of each column illustrates the block-wise label conversion for the corresponding weak label.
  • Figure 5: The overview of the ResMixMatch training paradigm. The weak label only defines the non-defective region and the unknown region. It is used to "fix" the estimated label predicted by the Swin-Transformer model $\Psi_{swin}(\cdot)$. As the name suggests, the proposed ResMixMatch algorithm train its network model by using the "mixed" labels and residuals.
  • ...and 2 more figures