Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection

Haiming Yao; Yunkang Cao; Wei Luo; Weihang Zhang; Wenyong Yu; Weiming Shen

Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection

Haiming Yao, Yunkang Cao, Wei Luo, Weihang Zhang, Wenyong Yu, Weiming Shen

TL;DR

This work tackles the challenge of multi-class industrial image anomaly detection by addressing the shortcoming of traditional reconstruction-based methods that suffer from identical mapping in a unified model. It introduces PNPT, a dual-stream transformer framework that incorporates a prior normality prompt to guide reconstruction, thereby aligning abnormal samples with normal templates. The architecture deploys four modules—CS-NPP, HPE, SACE, and CSCD—to enable semantic alignment between prior normality and sample self-attributes, with end-to-end training and a cosine-based hierarchical reconstruction loss. Empirically, PNPT achieves state-of-the-art image- and pixel-level AUROC across MVTec AD, MVTec LOCO, BTAD, and a real-world button dataset, while maintaining manageable computational costs, demonstrating practical potential for scalable multi-class industrial anomaly detection.

Abstract

Image anomaly detection plays a pivotal role in industrial inspection. Traditional approaches often demand distinct models for specific categories, resulting in substantial deployment costs. This raises concerns about multi-class anomaly detection, where a unified model is developed for multiple classes. However, applying conventional methods, particularly reconstruction-based models, directly to multi-class scenarios encounters challenges such as identical shortcut learning, hindering effective discrimination between normal and abnormal instances. To tackle this issue, our study introduces the Prior Normality Prompt Transformer (PNPT) method for multi-class image anomaly detection. PNPT strategically incorporates normal semantics prompting to mitigate the "identical mapping" problem. This entails integrating a prior normality prompt into the reconstruction process, yielding a dual-stream model. This innovative architecture combines normal prior semantics with abnormal samples, enabling dual-stream reconstruction grounded in both prior knowledge and intrinsic sample characteristics. PNPT comprises four essential modules: Class-Specific Normality Prompting Pool (CS-NPP), Hierarchical Patch Embedding (HPE), Semantic Alignment Coupling Encoding (SACE), and Contextual Semantic Conditional Decoding (CSCD). Experimental validation on diverse benchmark datasets and real-world industrial applications highlights PNPT's superior performance in multi-class industrial anomaly detection.

Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection

TL;DR

Abstract

Paper Structure (34 sections, 8 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 34 sections, 8 equations, 8 figures, 7 tables, 1 algorithm.

Introduction
Related work
Implicit normality learning approaches
Explicit normality learning approaches
PNPT methodology
Model overall structure
Class-Specific Normality Prompting Pool (CS-NPP)
Hierarchical Patch Embedding (HPE)
Semantic Alignment Coupling Encoding (SACE)
Long-Distance Dependency Semantic Aggregation
Contextual Semantic Alignment Fusion
Contextual Semantics Conditional Decoding (CSCD)
Training and Inference
Training loss
Anomaly Scoring
...and 19 more sections

Figures (8)

Figure 1: Left: Comparison between the separate single-class training and unified multi-class training paradigms. (a) The separate training mode needs to assign distinct weights specific to individual categories. (b) The unified model requires only a shared weight to simultaneously execute the detection task across multiple categories. (c) The learning distributions for two training paradigms. Right: Comparison of method motivations. (d) Implicit learning methods, characterized by the implicit introduction of normal semantics during training that is subsequently omitted during testing, lead to unstable reconstruction prone to identical mapping. (e) Explicit learning methods directly and explicitly compare the samples with normal semantic templates in the memory bank, thereby being influenced by misalignment factors. (f) In contrast, the proposed normality prompting framework introduces normal semantics as prompt information for stable reconstruction.
Figure 2: PNPT Framework: PNPT employs a dual information flow structure by leveraging the CS-NPP to extract category-specific normal prompt information for the input image. This extraction process facilitates the construction of dual input features encompassing both the normal prior and the sample itself. The dual input features are then transformed into token sequences through HPE. Then, the sequences are encoded via SACE. In SACE, semantic tokens $[\mathbf{Sem}]$ are incorporated into the two branch patch tokens, and the joint sequences undergo long-distance semantic dependency aggregation for acquiring the high-level semantics and the encoded patch tokens. These high-level semantics then pass through the Contextual Semantic Alignment Fusion module for semantics alignment. The decoding occurs in CSCD, using two branch patch tokens as queries and aligned semantics tokens as keys and values. The reconstruction features are obtained via the reverse process of HPE(denoted as HPE$^{-1}$).
Figure 3: CS-NPP diagram. (a) Multi-scale feature and global coding acquisition process. (b) Class-specific normality feature and query formation. (c) dual feature input construction. GAP and Cat. represent global average pooling and concatenation.
Figure 4: HPE schematics. The Proj. and Cat. represent projection and concatenation.
Figure 5: SACE diagram. (a) Long-Distance Dependency Semantic Aggregation. (b) Contextual Semantic Alignment Fusion.
...and 3 more figures

Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection

TL;DR

Abstract

Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (8)