Unlocking the Potential of Reverse Distillation for Anomaly Detection

Xinyue Liu; Jianyuan Wang; Biao Leng; Shuo Zhang

Unlocking the Potential of Reverse Distillation for Anomaly Detection

Xinyue Liu, Jianyuan Wang, Biao Leng, Shuo Zhang

TL;DR

This work extends Reverse Distillation for unsupervised anomaly detection by introducing an Expert-Teacher-Student (RD-E) framework and a Guided Information Injection (GII) module. The expert network jointly distills both the teacher and the student to enhance anomaly sensitivity and denoise the student’s features, while GII injects high-level teacher information in a controlled, similarity-guided manner to recover fine details without leaking anomalies. Experiments on MVTec AD, MPDD, BTAD, and VisA demonstrate state-of-the-art or competitive results in both anomaly detection and localization, with ablations confirming the benefits of RD-E and GII. The proposed approach improves robustness to missed detections and reduces false positives, offering a practically impactful advancement for unsupervised anomaly detection in industrial settings.

Abstract

Knowledge Distillation (KD) is a promising approach for unsupervised Anomaly Detection (AD). However, the student network's over-generalization often diminishes the crucial representation differences between teacher and student in anomalous regions, leading to detection failures. To addresses this problem, the widely accepted Reverse Distillation (RD) paradigm designs the asymmetry teacher and student, using an encoder as teacher and a decoder as student. Yet, the design of RD does not ensure that the teacher encoder effectively distinguishes between normal and abnormal features or that the student decoder generates anomaly-free features. Additionally, the absence of skip connections results in a loss of fine details during feature reconstruction. To address these issues, we propose RD with Expert, which introduces a novel Expert-Teacher-Student network for simultaneous distillation of both the teacher encoder and student decoder. The added expert network enhances the student's ability to generate normal features and optimizes the teacher's differentiation between normal and abnormal features, reducing missed detections. Additionally, Guided Information Injection is designed to filter and transfer features from teacher to student, improving detail reconstruction and minimizing false positives. Experiments on several benchmarks prove that our method outperforms existing unsupervised AD methods under RD paradigm, fully unlocking RD's potential.

Unlocking the Potential of Reverse Distillation for Anomaly Detection

TL;DR

Abstract

Unlocking the Potential of Reverse Distillation for Anomaly Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)