Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

Hanxi Li; Jingqi Wu; Lin Yuanbo Wu; Hao Chen; Deyin Liu; Chunhua Shen

Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

Hanxi Li, Jingqi Wu, Lin Yuanbo Wu, Hao Chen, Deyin Liu, Chunhua Shen

TL;DR

Industrial anomaly labeling is costly, motivating an interactive segmentation approach. ADClick converts sparse clicks, defect-specific language prompts, and residual features into dense anomaly masks, achieving high-quality labels with only 3–5 clicks and strong generalization. Extending to ADClick-Seg, the method attains state-of-the-art performance in both unsupervised and supervised anomaly detection/localization on benchmarks like MVTec AD and KolektorSDD2, using a semi-supervised, language-guided framework. The work demonstrates that combining location-aware residuals with cross-modal language guidance can significantly reduce labeling effort while delivering robust AD performance, with practical implications for real-world manufacturing pipelines.

Abstract

In the realm of practical Anomaly Detection (AD) tasks, manual labeling of anomalous pixels proves to be a costly endeavor. Consequently, many AD methods are crafted as one-class classifiers, tailored for training sets completely devoid of anomalies, ensuring a more cost-effective approach. While some pioneering work has demonstrated heightened AD accuracy by incorporating real anomaly samples in training, this enhancement comes at the price of labor-intensive labeling processes. This paper strikes the balance between AD accuracy and labeling expenses by introducing ADClick, a novel Interactive Image Segmentation (IIS) algorithm. ADClick efficiently generates "ground-truth" anomaly masks for real defective images, leveraging innovative residual features and meticulously crafted language prompts. Notably, ADClick showcases a significantly elevated generalization capacity compared to existing state-of-the-art IIS approaches. Functioning as an anomaly labeling tool, ADClick generates high-quality anomaly labels (AP $= 94.1\%$ on MVTec AD) based on only $3$ to $5$ manual click annotations per training image. Furthermore, we extend the capabilities of ADClick into ADClick-Seg, an enhanced model designed for anomaly detection and localization. By fine-tuning the ADClick-Seg model using the weak labels inferred by ADClick, we establish the state-of-the-art performances in supervised AD tasks (AP $= 86.4\%$ on MVTec AD and AP $= 78.4\%$, PRO $= 98.6\%$ on KSDD2).

Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

TL;DR

Abstract

on MVTec AD) based on only

manual click annotations per training image. Furthermore, we extend the capabilities of ADClick into ADClick-Seg, an enhanced model designed for anomaly detection and localization. By fine-tuning the ADClick-Seg model using the weak labels inferred by ADClick, we establish the state-of-the-art performances in supervised AD tasks (AP

on MVTec AD and AP

, PRO

on KSDD2).

Paper Structure (18 sections, 10 equations, 5 figures, 8 tables)

This paper contains 18 sections, 10 equations, 5 figures, 8 tables.

Introduction
Related Work
Interactive image segmentation
Referring segmentation
Anomaly detection with language guidance
Method
Overview
Interactive segmentation based on location-aware residual features
Defect specific language prompts
The training strategy
Implementation details
Experiments
Experimental settings
Accuracy of label generation
Anomaly detection and localization
...and 3 more sections

Figures (5)

Figure 1: The illustration of the conventional approach and our proposed approach of label generation for anomaly detection and localization. Better viewed in color.
Figure 2: The illustration of the network structure of the proposed ADClick method. There are four main input sources of the model, namely the query image, the reference (defect-free) images, the language guidance, and the manual clicks, respectively. Those inputs are processed collaboratively as described in this section. Note that the workflows of ADClick (in orange) and ADClick-Seg (in green) are slightly different due to the different vision tasks and supervision conditions. Better view in color.
Figure 3: The generation process of our defect-specific language prompts. The user only needs to supply keywords to form the templatized instruction and then the linguistic features can be generated by using the ChatGPT model, the BERT algorithm, and the final averaging operation. Better view in color.
Figure 4: The Language-Vision Cross Attention Module employed in this paper. Note that the Swin transformer model (in the top row) is fed with images and its parameters are frozen. The PWAM modules (shown in the bottom row) fuse the vision and language features and are fine-tuned during the training stage.
Figure 5: Our labeling tool. The positive click (shown in blue) and negative click (shown in yellow) guide the label generation successfully. Better view in color.

Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

TL;DR

Abstract

Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

Authors

TL;DR

Abstract

Table of Contents

Figures (5)