Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection

Canhui Tang; Sanping Zhou; Yizhe Li; Yonghao Dong; Le Wang

Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection

Canhui Tang, Sanping Zhou, Yizhe Li, Yonghao Dong, Le Wang

TL;DR

The paper addresses industrial anomaly detection under scarce abnormal data by strengthening both the teacher’s discriminative power and the student’s normality reconstruction. It introduces AAND, a two-stage framework with Anomaly Amplification (RAA featuring a Matching-guided Residual Gate and an Attribute-scaling Residual Generator) and Normality Distillation (reverse distillation with Hard Knowledge Distillation). The approach yields robust teacher–student feature discrepancy, achieving state-of-the-art results on VisA and MVTec3D-RGB and competitive gains on MVTec AD. This framework enhances practical anomaly detection by better distinguishing subtle anomalies while preserving normal data integrity, with potential extensions to multimodal data and more realistic anomaly synthesis.

Abstract

With the wide application of knowledge distillation between an ImageNet pre-trained teacher model and a learnable student model, industrial anomaly detection has witnessed a significant achievement in the past few years. The success of knowledge distillation mainly relies on how to keep the feature discrepancy between the teacher and student model, in which it assumes that: (1) the teacher model can jointly represent two different distributions for the normal and abnormal patterns, while (2) the student model can only reconstruct the normal distribution. However, it still remains a challenging issue to maintain these ideal assumptions in practice. In this paper, we propose a simple yet effective two-stage industrial anomaly detection framework, termed as AAND, which sequentially performs Anomaly Amplification and Normality Distillation to obtain robust feature discrepancy. In the first anomaly amplification stage, we propose a novel Residual Anomaly Amplification (RAA) module to advance the pre-trained teacher encoder. With the exposure of synthetic anomalies, it amplifies anomalies via residual generation while maintaining the integrity of pre-trained model. It mainly comprises a Matching-guided Residual Gate and an Attribute-scaling Residual Generator, which can determine the residuals' proportion and characteristic, respectively. In the second normality distillation stage, we further employ a reverse distillation paradigm to train a student decoder, in which a novel Hard Knowledge Distillation (HKD) loss is built to better facilitate the reconstruction of normal patterns. Comprehensive experiments on the MvTecAD, VisA, and MvTec3D-RGB datasets show that our method achieves state-of-the-art performance.

Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection

TL;DR

Abstract

Paper Structure (16 sections, 14 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 16 sections, 14 equations, 8 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Method
Overview.
Anomaly Amplification Stage
Matching-guided residual gate
Attribute-scaling residual generator
Normality Distillation Stage
Inference
Experiments
Dataset
Implementation Details
Main Results
Ablation Study
Visualization Analysis
...and 1 more sections

Figures (8)

Figure 1: Illustration of Teacher-Student Feature Discrepancy, whose success relies on how to keep the feature discrepancy between teacher and student model. This line of methods have two assumptions: (1) the teacher encoder can jointly represent two different distributions for both normal and abnormal patterns; while (2) the student decoder can only reconstruct the normal distribution.
Figure 2: Overview of our proposed AAND. I. Anomaly Amplification Stage: the vanilla teacher model encodes the input image into K-level features $\mathbf{F}_{T_k}$, and then these features are advanced to $\mathbf{F}_{A_k}$ through our proposed Residual Anomaly Amplification module (RAA). The RAA module effectively amplifies anomalies while preserving the integrity of the pre-trained model. It comprises a matching-guided residual gate and an attribute-scaling residual generator, which determine the proportion and characteristics of the residuals, respectively. II. Normality Distillation Stage: a student model decodes features $\mathbf{F}_{S_k}$, and these features are trained to distill the representation of advanced teacher only on normal samples, where "BN" denotes a bottleneck module and $\mathcal{L}_{HKD}$ represents our proposed Hard Knowledge Distillation loss. During inference, the "teacher-student discrepancy" is used for anomaly detection.
Figure 3: Illustration of our proposed RAA module. It utilizes a residual learning mechanism to amplify anomalies while maintaining the integrity of pre-trained model.
Figure 4: Illustration of our proposed HKD loss. It facilitates the reconstruction on challenging normal patterns by selecting the $K_h$ normal patches with the highest distillation loss for further training.
Figure 5: Qualitative results for anomaly localization, where "A-score" denotes the maximum value in the anomaly score map. Compared to RD RD and RD++ RDplus, our method can accurately localize anomalies even in some challenging cases where the abnormal region is extremely small or the appearance of the anomaly is very similar to normal data.
...and 3 more figures

Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection

TL;DR

Abstract

Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (8)