Table of Contents
Fetching ...

Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference

Xi Jiang, Ying Chen, Qiang Nie, Jianlin Liu, Yong Liu, Chengjie Wang, Feng Zheng

TL;DR

The paper addresses multi-class anomaly detection by mitigating inter-class interference in unified models. It introduces $MINT$-$AD$, a transformer-based reconstruction framework that employs a class-aware implicit neural representation to generate per-class queries, supervised by $L_{CE}$ and a prior distribution loss $L_{Prior}$ alongside the reconstruction loss $L_{MSE}$. The approach achieves state-of-the-art or competitive results on MVTec-AD, VisA, CIFAR-10, and a larger unified dataset, with improved anomaly localization and robustness to background noise. These results demonstrate the feasibility of leveraging category information during training with INR to enable scalable, robust multi-class anomaly detection for industrial inspection.

Abstract

In the context of high usability in single-class anomaly detection models, recent academic research has become concerned about the more complex multi-class anomaly detection. Although several papers have designed unified models for this task, they often overlook the utility of class labels, a potent tool for mitigating inter-class interference. To address this issue, we introduce a Multi-class Implicit Neural representation Transformer for unified Anomaly Detection (MINT-AD), which leverages the fine-grained category information in the training stage. By learning the multi-class distributions, the model generates class-aware query embeddings for the transformer decoder, mitigating inter-class interference within the reconstruction model. Utilizing such an implicit neural representation network, MINT-AD can project category and position information into a feature embedding space, further supervised by classification and prior probability loss functions. Experimental results on multiple datasets demonstrate that MINT-AD outperforms existing unified training models.

Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference

TL;DR

The paper addresses multi-class anomaly detection by mitigating inter-class interference in unified models. It introduces -, a transformer-based reconstruction framework that employs a class-aware implicit neural representation to generate per-class queries, supervised by and a prior distribution loss alongside the reconstruction loss . The approach achieves state-of-the-art or competitive results on MVTec-AD, VisA, CIFAR-10, and a larger unified dataset, with improved anomaly localization and robustness to background noise. These results demonstrate the feasibility of leveraging category information during training with INR to enable scalable, robust multi-class anomaly detection for industrial inspection.

Abstract

In the context of high usability in single-class anomaly detection models, recent academic research has become concerned about the more complex multi-class anomaly detection. Although several papers have designed unified models for this task, they often overlook the utility of class labels, a potent tool for mitigating inter-class interference. To address this issue, we introduce a Multi-class Implicit Neural representation Transformer for unified Anomaly Detection (MINT-AD), which leverages the fine-grained category information in the training stage. By learning the multi-class distributions, the model generates class-aware query embeddings for the transformer decoder, mitigating inter-class interference within the reconstruction model. Utilizing such an implicit neural representation network, MINT-AD can project category and position information into a feature embedding space, further supervised by classification and prior probability loss functions. Experimental results on multiple datasets demonstrate that MINT-AD outperforms existing unified training models.
Paper Structure (34 sections, 9 equations, 10 figures, 15 tables)

This paper contains 34 sections, 9 equations, 10 figures, 15 tables.

Figures (10)

  • Figure 1: Decision boundary of Reconstruction Models. "Identical shortcut" occurs when the single-class model is trained with multi-class data. The previous unified model has significantly mitigated this issue. "inter-class interference" now constitutes the primary challenge facing the present unified models.
  • Figure 2: Baseline and three class-aware improvements: (a) The vanilla UniAD you2022unified has a shared query for all categories in each layer of the decoder. (b) One simple way is using different queries for different classes. (c) Concatenating the class token before the image feature is intuitive. (d) Using a network to map the prompt into a query can also incorporate category information.
  • Figure 3: Architecture of MINT-AD network. 1. A dual-path INR network with different activation functions is introduced to map position encoding and class-aware prompt to the image feature dimension and assist the reconstruction transformer through cross-attention as queries. 2. We are trying to find a fine-grained prompt that can map the distribution of subcategories. 3. Compared with getting prompts from the (a) image or (b) label, (c) subcategory features from the classification network from the image to label is a better prompt.
  • Figure 4: (a$\&$b) T-SNE visualization of image features: triangles for normal samples, circles for anomalies. Unlike prior work, our method leverages category data, making the distinctions between different classes more pronounced post-reconstruction, as the outlier samples align closer to their respective normal class counterparts. (c) The self-attention map of queries from MLP and INR. The coordinate-based network architecture of INR can model more structural information.
  • Figure 5: Decision boundaries on 2D synthetic multi-class data with different models. (a) vanilla MLP-based autoencoder, the "identical shortcut" occurs even during training with a single cluster. (b) denoising AE, where the reconstruction task is translated into a denoising task. The "shortcut" problem is solved, but "inter-class interference" occurs in a unified training setting.
  • ...and 5 more figures