Table of Contents
Fetching ...

A Recover-then-Discriminate Framework for Robust Anomaly Detection

Peng Xing, Dong Zhang, Jinhui Tang, Zechao li

TL;DR

This work addresses gaps in anomaly detection by identifying two core failure modes in recovery-based methods and proposing a Recover-then-Discriminate (ReDi) framework. The Recover Network employs HIP, which uses a self-generated map (e.g., HOG) and a similar normal image as a prompt to guide recovery without revealing anomalous details, mitigating recovery shortcuts. The Discriminate Network compares multi-scale features from a reference (pre-trained) branch and a recovered-branch, guided by a cosine-alignment loss $L_D$ and a self-correlation loss $L_S$, enabling effective detection and precise segmentation in feature space. Extensive experiments on MVTec-AD and KolektorSDD2 show state-of-the-art or competitive performance for both anomaly detection and segmentation, with ablations validating the benefits of HIP, $L_S$, and the FRB, and a thorough analysis of self-generated maps and prompts. The approach offers a practical, unsupervised AD solution that leverages low-level structure and normal semantic information to robustly identify anomalies in complex visual data.

Abstract

Anomaly detection (AD) has been extensively studied and applied in a wide range of scenarios in the recent past. However, there are still gaps between achieved and desirable levels of recognition accuracy for making AD for practical applications. In this paper, we start from an insightful analysis of two types of fundamental yet representative failure cases in the baseline model, and reveal reasons that hinder current AD methods from achieving a higher recognition accuracy. Specifically, by Case-1, we found that the main reasons detrimental to current AD methods is that the inputs to the recovery model contain a large number of detailed features to be recovered, which leads to the normal/abnormal area has-not/has been recovered into its original state. By Case-2, we surprisingly found that the abnormal area that cannot be recognized in image-level representations can be easily recognized in the feature-level representation. Based on the above observations, we propose a novel Recover-then-Discriminate (ReDi) framework for AD. ReDi takes a self-generated feature map and a selected prompted image as explicit input information to solve problems in case-1. Concurrently, a feature-level discriminative network is proposed to enhance abnormal differences between the recovered representation and the input representation. Extensive experimental results on two popular yet challenging AD datasets validate that ReDi achieves the new state-of-the-art accuracy.

A Recover-then-Discriminate Framework for Robust Anomaly Detection

TL;DR

This work addresses gaps in anomaly detection by identifying two core failure modes in recovery-based methods and proposing a Recover-then-Discriminate (ReDi) framework. The Recover Network employs HIP, which uses a self-generated map (e.g., HOG) and a similar normal image as a prompt to guide recovery without revealing anomalous details, mitigating recovery shortcuts. The Discriminate Network compares multi-scale features from a reference (pre-trained) branch and a recovered-branch, guided by a cosine-alignment loss and a self-correlation loss , enabling effective detection and precise segmentation in feature space. Extensive experiments on MVTec-AD and KolektorSDD2 show state-of-the-art or competitive performance for both anomaly detection and segmentation, with ablations validating the benefits of HIP, , and the FRB, and a thorough analysis of self-generated maps and prompts. The approach offers a practical, unsupervised AD solution that leverages low-level structure and normal semantic information to robustly identify anomalies in complex visual data.

Abstract

Anomaly detection (AD) has been extensively studied and applied in a wide range of scenarios in the recent past. However, there are still gaps between achieved and desirable levels of recognition accuracy for making AD for practical applications. In this paper, we start from an insightful analysis of two types of fundamental yet representative failure cases in the baseline model, and reveal reasons that hinder current AD methods from achieving a higher recognition accuracy. Specifically, by Case-1, we found that the main reasons detrimental to current AD methods is that the inputs to the recovery model contain a large number of detailed features to be recovered, which leads to the normal/abnormal area has-not/has been recovered into its original state. By Case-2, we surprisingly found that the abnormal area that cannot be recognized in image-level representations can be easily recognized in the feature-level representation. Based on the above observations, we propose a novel Recover-then-Discriminate (ReDi) framework for AD. ReDi takes a self-generated feature map and a selected prompted image as explicit input information to solve problems in case-1. Concurrently, a feature-level discriminative network is proposed to enhance abnormal differences between the recovered representation and the input representation. Extensive experimental results on two popular yet challenging AD datasets validate that ReDi achieves the new state-of-the-art accuracy.
Paper Structure (28 sections, 10 equations, 8 figures, 10 tables)

This paper contains 28 sections, 10 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: The key motivations of this paper. In Case-1, we found that the main reason detrimental to current approaches for anomaly detection is that the normal area has not been recovered to its original state, and the abnormal area has been recovered to its original state. In Case-2, we found that the abnormal area that cannot be recognized in the image-level representation can be easily recognized in the feature-level representation. The red/yellow dotted boxes highlight normal/abnormal areas."GT", "Rec.", "Inp.", and "Fea. Diff." are "Ground-Truth", "Reconstructed", "Inpainting", and "Feature-level Differences", respectively.
  • Figure 2: Overview of the proposed ReDi framework. Given an input image $X$, the self-generated map $H$ (e.g., HOG) is first extracted and the prompt image is sampled from the set of normal image $X^+$. Then, the recovery network relies on the self-generated graph and prompt image to generate the recovery image $Y$. The reference branch extracts features of $X$ as the reference feature $F_X$. The recovery branch first extracts the feature $R_Y$ of the recovery image $Y$, and then inputs it to the Feature Recovery Block to generate recovery feature $F_Y$. The anomalous regions are inferred by $F_X$ and $F_Y$ in the inference phase. $L_{Rec}$ and $L_{Dis}$ denote loss functions of two networks respectively.
  • Figure 3: Comparison of the anomaly segmentation results of the proposed ReDi with RD. ReDi has a more powerful anomaly segmentation ability, effectively segmenting tiny anomaly regions.
  • Figure 4: Samples with labeling inaccuracies. "Zoom" denotes the zoomed-in view of the yellow boxes. Our ReDi accurately segments unpredictable and subtle anomalous objects.
  • Figure 5: Recovery results for four different self-generated maps in Recover Network. The top is the Input/recovery images, and the bottom is the GT/self-generated maps.
  • ...and 3 more figures