Table of Contents
Fetching ...

Memory-Augmented Dual-Decoder Networks for Multi-Class Unsupervised Anomaly Detection

Jingyu Xing, Chenwei Tang, Tao Wang, Rong Xiao, Wei Ju, Ji-Zhe Zhou, Liangli Zhen, Jiancheng Lv

TL;DR

This paper tackles multi-class unsupervised anomaly detection by introducing Memory-Augmented Dual-Decoder Networks (MDD-Net), which jointly address over-generalization of anomalies and insufficient reconstruction of complex normal patterns. The framework comprises a Dual-Decoder Reverse Distillation Network (DRD-Net) with a Restoration Decoder and an Identity Decoder, and a Class-aware Memory Module (CMM) that stores normal prototypes and enforces class-discriminative supervision. Training combines restoration, identity, and discrepancy losses, while memory prototypes enable targeted suppression of anomaly reconstructions. Inference fuses two complementary discrepancy signals to produce accurate anomaly maps, achieving state-of-the-art results across four benchmarks, including industrial and medical datasets, with strong localization performance and robust generalization. The work highlights the value of decoupled reconstruction objectives and memory-guided normality modeling for scalable MUAD systems, while outlining avenues for reducing reliance on category labels and improving efficiency.

Abstract

Recent advances in unsupervised anomaly detection (UAD) have shifted from single-class to multi-class scenarios. In such complex contexts, the increasing pattern diversity has brought two challenges to reconstruction-based approaches: (1) over-generalization: anomalies that are subtle or share compositional similarities with normal patterns may be reconstructed with high fidelity, making them difficult to distinguish from normal instances; and (2) insufficient normality reconstruction: complex normal features, such as intricate textures or fine-grained structures, may not be faithfully reconstructed due to the model's limited representational capacity, resulting in false positives. Existing methods typically focus on addressing the former, which unintentionally exacerbate the latter, resulting in inadequate representation of intricate normal patterns. To concurrently address these two challenges, we propose a Memory-augmented Dual-Decoder Networks (MDD-Net). This network includes two critical components: a Dual-Decoder Reverse Distillation Network (DRD-Net) and a Class-aware Memory Module (CMM). Specifically, the DRD-Net incorporates a restoration decoder designed to recover normal features from synthetic abnormal inputs and an identity decoder to reconstruct features that maintain the anomalous semantics. By exploiting the discrepancy between features produced by two decoders, our approach refines anomaly scores beyond the conventional encoder-decoder comparison paradigm, effectively reducing false positives and enhancing localization accuracy. Furthermore, the CMM explicitly encodes and preserves class-specific normal prototypes, actively steering the network away from anomaly reconstruction. Comprehensive experimental results across several benchmarks demonstrate the superior performance of our MDD-Net framework over current SoTA approaches in multi-class UAD tasks.

Memory-Augmented Dual-Decoder Networks for Multi-Class Unsupervised Anomaly Detection

TL;DR

This paper tackles multi-class unsupervised anomaly detection by introducing Memory-Augmented Dual-Decoder Networks (MDD-Net), which jointly address over-generalization of anomalies and insufficient reconstruction of complex normal patterns. The framework comprises a Dual-Decoder Reverse Distillation Network (DRD-Net) with a Restoration Decoder and an Identity Decoder, and a Class-aware Memory Module (CMM) that stores normal prototypes and enforces class-discriminative supervision. Training combines restoration, identity, and discrepancy losses, while memory prototypes enable targeted suppression of anomaly reconstructions. Inference fuses two complementary discrepancy signals to produce accurate anomaly maps, achieving state-of-the-art results across four benchmarks, including industrial and medical datasets, with strong localization performance and robust generalization. The work highlights the value of decoupled reconstruction objectives and memory-guided normality modeling for scalable MUAD systems, while outlining avenues for reducing reliance on category labels and improving efficiency.

Abstract

Recent advances in unsupervised anomaly detection (UAD) have shifted from single-class to multi-class scenarios. In such complex contexts, the increasing pattern diversity has brought two challenges to reconstruction-based approaches: (1) over-generalization: anomalies that are subtle or share compositional similarities with normal patterns may be reconstructed with high fidelity, making them difficult to distinguish from normal instances; and (2) insufficient normality reconstruction: complex normal features, such as intricate textures or fine-grained structures, may not be faithfully reconstructed due to the model's limited representational capacity, resulting in false positives. Existing methods typically focus on addressing the former, which unintentionally exacerbate the latter, resulting in inadequate representation of intricate normal patterns. To concurrently address these two challenges, we propose a Memory-augmented Dual-Decoder Networks (MDD-Net). This network includes two critical components: a Dual-Decoder Reverse Distillation Network (DRD-Net) and a Class-aware Memory Module (CMM). Specifically, the DRD-Net incorporates a restoration decoder designed to recover normal features from synthetic abnormal inputs and an identity decoder to reconstruct features that maintain the anomalous semantics. By exploiting the discrepancy between features produced by two decoders, our approach refines anomaly scores beyond the conventional encoder-decoder comparison paradigm, effectively reducing false positives and enhancing localization accuracy. Furthermore, the CMM explicitly encodes and preserves class-specific normal prototypes, actively steering the network away from anomaly reconstruction. Comprehensive experimental results across several benchmarks demonstrate the superior performance of our MDD-Net framework over current SoTA approaches in multi-class UAD tasks.

Paper Structure

This paper contains 14 sections, 14 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Reconstruction error histograms and detection heatmap results of ViTAD vitad & Ours on MVTec AD mvtec.
  • Figure 2: Training Process of MDD-Net, which is divided into five steps: Normality Reconstruction: Normal images are input and reconstructed through the frozen Teacher Encoder, CMM, Neck, and Restoration Decoder to update CMM parameters with $\mathcal{L}_{rec}$ and $\mathcal{L}_{cls}$ loss. Anomaly Synthesis: Pseudo-anomaly samples containing anomalous images and corresponding masks are generated using an image-level anomaly synthesis method. Anomalous Feature Extraction: The synthesized anomalies are fed into the frozen Teacher Encoder to extract features containing anomalous information. Anomaly Restoration: The anomalous features extracted in step are processed sequentially through CMM, Neck, and Restoration Decoder for anomalous feature elimination, forming the restoration loss term $\mathcal{L}_{restoration}$. Anomaly Reconstruction: The anomalous features extracted in step are reconstructed through Neck and Identity Decoder, constituting the identity loss term $\mathcal{L}_{identity}$. Additionally, to enhance feature discrepancies between the two decoders in anomalous regions, an anomaly mask supervision mechanism is introduced, resulting in the discrepancy loss term $\mathcal{L}_{dist}$.
  • Figure 3: Visualization for detection results of different methods on MVTec AD/VisA/Real-IAD datasets.
  • Figure 4: Visualization for detection results of different methods on Uni-Medical.
  • Figure 5: Visualization of class-specific and class-shared memory items of CMM. Memory items are categorized by entropy thresholds (top 10% entropy as class-shared gray dots, remainder as class-specific colored dots). Class-specific items are assigned to their maximum-probability classes (color-coded), while shared items connect to their two most probable class prototypes (solid lines), indicating inter-class knowledge. Red arrows highlight class-shared patterns, e.g., leather$\leftrightarrow$grid. Cluster validity is quantified by silhouette score cluster1 and Davies-Bouldin index clsuter2.