Table of Contents
Fetching ...

Unlocking the Potential of Reverse Distillation for Anomaly Detection

Xinyue Liu, Jianyuan Wang, Biao Leng, Shuo Zhang

TL;DR

This work extends Reverse Distillation for unsupervised anomaly detection by introducing an Expert-Teacher-Student (RD-E) framework and a Guided Information Injection (GII) module. The expert network jointly distills both the teacher and the student to enhance anomaly sensitivity and denoise the student’s features, while GII injects high-level teacher information in a controlled, similarity-guided manner to recover fine details without leaking anomalies. Experiments on MVTec AD, MPDD, BTAD, and VisA demonstrate state-of-the-art or competitive results in both anomaly detection and localization, with ablations confirming the benefits of RD-E and GII. The proposed approach improves robustness to missed detections and reduces false positives, offering a practically impactful advancement for unsupervised anomaly detection in industrial settings.

Abstract

Knowledge Distillation (KD) is a promising approach for unsupervised Anomaly Detection (AD). However, the student network's over-generalization often diminishes the crucial representation differences between teacher and student in anomalous regions, leading to detection failures. To addresses this problem, the widely accepted Reverse Distillation (RD) paradigm designs the asymmetry teacher and student, using an encoder as teacher and a decoder as student. Yet, the design of RD does not ensure that the teacher encoder effectively distinguishes between normal and abnormal features or that the student decoder generates anomaly-free features. Additionally, the absence of skip connections results in a loss of fine details during feature reconstruction. To address these issues, we propose RD with Expert, which introduces a novel Expert-Teacher-Student network for simultaneous distillation of both the teacher encoder and student decoder. The added expert network enhances the student's ability to generate normal features and optimizes the teacher's differentiation between normal and abnormal features, reducing missed detections. Additionally, Guided Information Injection is designed to filter and transfer features from teacher to student, improving detail reconstruction and minimizing false positives. Experiments on several benchmarks prove that our method outperforms existing unsupervised AD methods under RD paradigm, fully unlocking RD's potential.

Unlocking the Potential of Reverse Distillation for Anomaly Detection

TL;DR

This work extends Reverse Distillation for unsupervised anomaly detection by introducing an Expert-Teacher-Student (RD-E) framework and a Guided Information Injection (GII) module. The expert network jointly distills both the teacher and the student to enhance anomaly sensitivity and denoise the student’s features, while GII injects high-level teacher information in a controlled, similarity-guided manner to recover fine details without leaking anomalies. Experiments on MVTec AD, MPDD, BTAD, and VisA demonstrate state-of-the-art or competitive results in both anomaly detection and localization, with ablations confirming the benefits of RD-E and GII. The proposed approach improves robustness to missed detections and reduces false positives, offering a practically impactful advancement for unsupervised anomaly detection in industrial settings.

Abstract

Knowledge Distillation (KD) is a promising approach for unsupervised Anomaly Detection (AD). However, the student network's over-generalization often diminishes the crucial representation differences between teacher and student in anomalous regions, leading to detection failures. To addresses this problem, the widely accepted Reverse Distillation (RD) paradigm designs the asymmetry teacher and student, using an encoder as teacher and a decoder as student. Yet, the design of RD does not ensure that the teacher encoder effectively distinguishes between normal and abnormal features or that the student decoder generates anomaly-free features. Additionally, the absence of skip connections results in a loss of fine details during feature reconstruction. To address these issues, we propose RD with Expert, which introduces a novel Expert-Teacher-Student network for simultaneous distillation of both the teacher encoder and student decoder. The added expert network enhances the student's ability to generate normal features and optimizes the teacher's differentiation between normal and abnormal features, reducing missed detections. Additionally, Guided Information Injection is designed to filter and transfer features from teacher to student, improving detail reconstruction and minimizing false positives. Experiments on several benchmarks prove that our method outperforms existing unsupervised AD methods under RD paradigm, fully unlocking RD's potential.

Paper Structure

This paper contains 38 sections, 7 equations, 14 figures, 12 tables, 1 algorithm.

Figures (14)

  • Figure 1: Anomaly localization examples. Our method reduces missed detections and false positives in RD.
  • Figure 2: Schematic diagram of the framework and data flow of RD and its variants including our proposed method.
  • Figure 3: Overview of our proposed method. (a) shows the overall architecture and training process of our designed Expert-Teacher-Student Network, where the expert are frozen and the teacher and student are trainable. Our proposed Guided Information Injection module is inserted between the two blocks of the student. (b) shows how to distill the teacher and student with the expert. The impact of distillation on the teacher and student features is visually represented. Through the two sub-tasks two sub-tasks: making the teacher encoder more sensitive to anomalies and better denoising the student features, differences between teacher and student features are achieved in anomalous regions, while similarities in normal regions are maintained.
  • Figure 4: (a) Cosine distance maps between features of teacher and student. (b) Guided Information Injection.
  • Figure 5: Inference procedure of our proposed method. The expert is removed and both teacher and student are frozen.
  • ...and 9 more figures