Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection
Haiming Yao, Yunkang Cao, Wei Luo, Weihang Zhang, Wenyong Yu, Weiming Shen
TL;DR
This work tackles the challenge of multi-class industrial image anomaly detection by addressing the shortcoming of traditional reconstruction-based methods that suffer from identical mapping in a unified model. It introduces PNPT, a dual-stream transformer framework that incorporates a prior normality prompt to guide reconstruction, thereby aligning abnormal samples with normal templates. The architecture deploys four modules—CS-NPP, HPE, SACE, and CSCD—to enable semantic alignment between prior normality and sample self-attributes, with end-to-end training and a cosine-based hierarchical reconstruction loss. Empirically, PNPT achieves state-of-the-art image- and pixel-level AUROC across MVTec AD, MVTec LOCO, BTAD, and a real-world button dataset, while maintaining manageable computational costs, demonstrating practical potential for scalable multi-class industrial anomaly detection.
Abstract
Image anomaly detection plays a pivotal role in industrial inspection. Traditional approaches often demand distinct models for specific categories, resulting in substantial deployment costs. This raises concerns about multi-class anomaly detection, where a unified model is developed for multiple classes. However, applying conventional methods, particularly reconstruction-based models, directly to multi-class scenarios encounters challenges such as identical shortcut learning, hindering effective discrimination between normal and abnormal instances. To tackle this issue, our study introduces the Prior Normality Prompt Transformer (PNPT) method for multi-class image anomaly detection. PNPT strategically incorporates normal semantics prompting to mitigate the "identical mapping" problem. This entails integrating a prior normality prompt into the reconstruction process, yielding a dual-stream model. This innovative architecture combines normal prior semantics with abnormal samples, enabling dual-stream reconstruction grounded in both prior knowledge and intrinsic sample characteristics. PNPT comprises four essential modules: Class-Specific Normality Prompting Pool (CS-NPP), Hierarchical Patch Embedding (HPE), Semantic Alignment Coupling Encoding (SACE), and Contextual Semantic Conditional Decoding (CSCD). Experimental validation on diverse benchmark datasets and real-world industrial applications highlights PNPT's superior performance in multi-class industrial anomaly detection.
