Table of Contents
Fetching ...

Continuous Memory Representation for Anomaly Detection

Joo Chan Lee, Taejune Kim, Eunbyung Park, Simon S. Woo, Jong Hwan Ko

TL;DR

CRAD presents a continuous memory framework for unsupervised anomaly detection by representing normal features on two continuous grids (local and global) and sampling via learned coordinates. The method uses coordinate jittering and a feature refinement step to improve generalization and reduce false positives, achieving state-of-the-art performance in unified multi-class anomaly detection on MVTec AD and strong results on VisA. By replacing discrete memory with continuous grids and interpolative sampling, CRAD offers improved generalization, reduced identity shortcuts, and efficient memory usage, enabling robust detection and localization across multiple object classes. The work demonstrates practical impact for scalable, cross-class anomaly detection with compact memory and fast inference, while acknowledging limitations in extremely scarce data regimes and suggesting avenues for future work.

Abstract

There have been significant advancements in anomaly detection in an unsupervised manner, where only normal images are available for training. Several recent methods aim to detect anomalies based on a memory, comparing or reconstructing the input with directly stored normal features (or trained features with normal images). However, such memory-based approaches operate on a discrete feature space implemented by the nearest neighbor or attention mechanism, suffering from poor generalization or an identity shortcut issue outputting the same as input, respectively. Furthermore, the majority of existing methods are designed to detect single-class anomalies, resulting in unsatisfactory performance when presented with multiple classes of objects. To tackle all of the above challenges, we propose CRAD, a novel anomaly detection method for representing normal features within a "continuous" memory, enabled by transforming spatial features into coordinates and mapping them to continuous grids. Furthermore, we carefully design the grids tailored for anomaly detection, representing both local and global normal features and fusing them effectively. Our extensive experiments demonstrate that CRAD successfully generalizes the normal features and mitigates the identity shortcut, furthermore, CRAD effectively handles diverse classes in a single model thanks to the high-granularity continuous representation. In an evaluation using the MVTec AD dataset, CRAD significantly outperforms the previous state-of-the-art method by reducing 65.0% of the error for multi-class unified anomaly detection. The project page is available at https://tae-mo.github.io/crad/.

Continuous Memory Representation for Anomaly Detection

TL;DR

CRAD presents a continuous memory framework for unsupervised anomaly detection by representing normal features on two continuous grids (local and global) and sampling via learned coordinates. The method uses coordinate jittering and a feature refinement step to improve generalization and reduce false positives, achieving state-of-the-art performance in unified multi-class anomaly detection on MVTec AD and strong results on VisA. By replacing discrete memory with continuous grids and interpolative sampling, CRAD offers improved generalization, reduced identity shortcuts, and efficient memory usage, enabling robust detection and localization across multiple object classes. The work demonstrates practical impact for scalable, cross-class anomaly detection with compact memory and fast inference, while acknowledging limitations in extremely scarce data regimes and suggesting avenues for future work.

Abstract

There have been significant advancements in anomaly detection in an unsupervised manner, where only normal images are available for training. Several recent methods aim to detect anomalies based on a memory, comparing or reconstructing the input with directly stored normal features (or trained features with normal images). However, such memory-based approaches operate on a discrete feature space implemented by the nearest neighbor or attention mechanism, suffering from poor generalization or an identity shortcut issue outputting the same as input, respectively. Furthermore, the majority of existing methods are designed to detect single-class anomalies, resulting in unsatisfactory performance when presented with multiple classes of objects. To tackle all of the above challenges, we propose CRAD, a novel anomaly detection method for representing normal features within a "continuous" memory, enabled by transforming spatial features into coordinates and mapping them to continuous grids. Furthermore, we carefully design the grids tailored for anomaly detection, representing both local and global normal features and fusing them effectively. Our extensive experiments demonstrate that CRAD successfully generalizes the normal features and mitigates the identity shortcut, furthermore, CRAD effectively handles diverse classes in a single model thanks to the high-granularity continuous representation. In an evaluation using the MVTec AD dataset, CRAD significantly outperforms the previous state-of-the-art method by reducing 65.0% of the error for multi-class unified anomaly detection. The project page is available at https://tae-mo.github.io/crad/.
Paper Structure (23 sections, 7 equations, 6 figures, 10 tables)

This paper contains 23 sections, 7 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Conceptual diagram and qualitative results of existing methods and ours. (a) and (b) use single and multiple normal features in a discrete memory, respectively, while our method (c) exploits continuous feature memory. We visualize the anomaly detection process with the normal (navy) and abnormal (red) patches of the top-left reference image. 'Pred.' indicates the prediction based on the disparity, and wrong predictions are marked as (X) with red color. We present the reconstruction results based on the reference abnormal images.
  • Figure 2: (a) The detailed architecture of CRAD and (b) visualization of coordinate jittering. The input $x$ is firstly transformed into pixel-wise and feature-wise coordinates. After the normal features are sampled from local and global representations, they are fused by CNN blocks. The final reconstruction is acquired through the proposed feature refinement process.
  • Figure 3: Visualization of CRAD's pipeline. Each marker in (a) corresponds to the patch on the left image that has the same number and color. Each marker in (b) corresponds to a single image from the test dataset, where different colors represent distinct classes, and circles and triangles denote the normal and abnormal images, respectively. 'Dim 1' and 'Dim 2' are the two dimensions of 2D grids.
  • Figure 4: Visualization of the contents mapped at a continuous grid. We manually select six global coordinates and visualize the corresponding sampled normal features.
  • Figure 5: Qualitative results of CRAD on MVTec AD. Each row of the figure represents anomaly images, corresponding ground truths, results from UniAD, and our results.
  • ...and 1 more figures