Table of Contents
Fetching ...

Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection

Liren He, Zhengkai Jiang, Jinlong Peng, Liang Liu, Qiangang Du, Xiaobin Hu, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang

TL;DR

The paper tackles multi-class unsupervised anomaly detection by addressing learning shortcuts in reconstruction-based models through a novel Reconstruct from Learnable Reference (RLR) framework. RLR replaces input-centric self-reconstruction with feature reconstruction from a learnable reference, and augments it with Local Cross Attention (LCA) and Masked Learnable Key Attention (MLKA) to robustly capture normal patterns. Evaluated on MVTec-AD and VisA, RLR achieves state-of-the-art Image-AUROC and Pixel-AUROC, demonstrating superior anomaly detection and localization under a unified setting, with code to be released. This approach offers practical impact for industrial defect detection by enabling a single model to handle multiple classes with improved reliability and interpretability of the reconstructed normal patterns.

Abstract

In the field of multi-class anomaly detection, reconstruction-based methods derived from single-class anomaly detection face the well-known challenge of "learning shortcuts", wherein the model fails to learn the patterns of normal samples as it should, opting instead for shortcuts such as identity mapping or artificial noise elimination. Consequently, the model becomes unable to reconstruct genuine anomalies as normal instances, resulting in a failure of anomaly detection. To counter this issue, we present a novel unified feature reconstruction-based anomaly detection framework termed RLR (Reconstruct features from a Learnable Reference representation). Unlike previous methods, RLR utilizes learnable reference representations to compel the model to learn normal feature patterns explicitly, thereby prevents the model from succumbing to the "learning shortcuts" issue. Additionally, RLR incorporates locality constraints into the learnable reference to facilitate more effective normal pattern capture and utilizes a masked learnable key attention mechanism to enhance robustness. Evaluation of RLR on the 15-category MVTec-AD dataset and the 12-category VisA dataset shows superior performance compared to state-of-the-art methods under the unified setting. The code of RLR will be publicly available.

Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection

TL;DR

The paper tackles multi-class unsupervised anomaly detection by addressing learning shortcuts in reconstruction-based models through a novel Reconstruct from Learnable Reference (RLR) framework. RLR replaces input-centric self-reconstruction with feature reconstruction from a learnable reference, and augments it with Local Cross Attention (LCA) and Masked Learnable Key Attention (MLKA) to robustly capture normal patterns. Evaluated on MVTec-AD and VisA, RLR achieves state-of-the-art Image-AUROC and Pixel-AUROC, demonstrating superior anomaly detection and localization under a unified setting, with code to be released. This approach offers practical impact for industrial defect detection by enabling a single model to handle multiple classes with improved reliability and interpretability of the reconstructed normal patterns.

Abstract

In the field of multi-class anomaly detection, reconstruction-based methods derived from single-class anomaly detection face the well-known challenge of "learning shortcuts", wherein the model fails to learn the patterns of normal samples as it should, opting instead for shortcuts such as identity mapping or artificial noise elimination. Consequently, the model becomes unable to reconstruct genuine anomalies as normal instances, resulting in a failure of anomaly detection. To counter this issue, we present a novel unified feature reconstruction-based anomaly detection framework termed RLR (Reconstruct features from a Learnable Reference representation). Unlike previous methods, RLR utilizes learnable reference representations to compel the model to learn normal feature patterns explicitly, thereby prevents the model from succumbing to the "learning shortcuts" issue. Additionally, RLR incorporates locality constraints into the learnable reference to facilitate more effective normal pattern capture and utilizes a masked learnable key attention mechanism to enhance robustness. Evaluation of RLR on the 15-category MVTec-AD dataset and the 12-category VisA dataset shows superior performance compared to state-of-the-art methods under the unified setting. The code of RLR will be publicly available.
Paper Structure (18 sections, 17 equations, 4 figures, 7 tables)

This paper contains 18 sections, 17 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Motivation and effectiveness of our method. Existing frameworks are shown in (a) and (b), fall into learning shortcuts issue. Our framework is depicted in (c), which utilizes learnable reference representation for feature reconstruction to address this issue. (d) shows the visualizations of the reconstructed features, which includes the Normal Sample, the input Anomaly Sample, as well as the features recovered from the Encoder, Encoder-Decoder (specifically UniAD you2022unified) and our proposed methods.
  • Figure 1: Qualitative results on VisA. We visualize several anomalies (Anomaly) along with their corresponding Ground Truth (GT), the detection results of UniAD (UniAD Pred), and the detection results of our method (Ours Pred).
  • Figure 2: Framework of our approach. RLR consists of Multi-Scale Feature Extraction through pre-trained model, Feature Reconstruction with combination of Masked Learnable Key Attention and Local Cross Attention, Loss and Score Map calculation between recovered features and original features.
  • Figure 3: Qualitative results on MVTec-AD. We visualize several anomalies (Anomaly) along with their corresponding Ground Truth (GT), the detection results of UniAD (UniAD Pred), the detection results of our method (Ours Pred), and the visualization of ours reconstructed features (Feature Vis).