Dual-Student Knowledge Distillation Networks for Unsupervised Anomaly Detection

Liyi Yao; Shaobing Gao

Dual-Student Knowledge Distillation Networks for Unsupervised Anomaly Detection

Liyi Yao, Shaobing Gao

TL;DR

This work tackles unsupervised industrial anomaly detection under data imbalance and diverse defect types by proposing Dual-Student Knowledge Distillation (DSKD), where a fixed teacher guides two inverted students (Se and Sd) to strengthen normal-pattern consistency while boosting anomaly representation. The method leverages multi-scale distillation of intermediate feature maps and a deep feature embedding bottleneck to fuse semantic information and promote diverse anomaly cues, with anomaly inference driven by pixel-level discrepancies across scales. Experiments on MVTec AD, MVTec 3D-AD, and MT Defects demonstrate strong image- and pixel-level detection and localization performance with low computational complexity, backed by comprehensive ablations that validate the contributions of the dual-student architecture, DF embedding, and multi-scale fusion. Overall, DSKD advances unsupervised AD by balancing robust normal-data alignment with enhanced sensitivity to anomalous patterns, offering practical impact for efficient industrial inspection and potential extension to 3D data.

Abstract

Due to the data imbalance and the diversity of defects, student-teacher networks (S-T) are favored in unsupervised anomaly detection, which explores the discrepancy in feature representation derived from the knowledge distillation process to recognize anomalies. However, vanilla S-T network is not stable. Employing identical structures to construct the S-T network may weaken the representative discrepancy on anomalies. But using different structures can increase the likelihood of divergent performance on normal data. To address this problem, we propose a novel dual-student knowledge distillation (DSKD) architecture. Different from other S-T networks, we use two student networks a single pre-trained teacher network, where the students have the same scale but inverted structures. This framework can enhance the distillation effect to improve the consistency in recognition of normal data, and simultaneously introduce diversity for anomaly representation. To explore high-dimensional semantic information to capture anomaly clues, we employ two strategies. First, a pyramid matching mode is used to perform knowledge distillation on multi-scale feature maps in the intermediate layers of networks. Second, an interaction is facilitated between the two student networks through a deep feature embedding module, which is inspired by real-world group discussions. In terms of classification, we obtain pixel-wise anomaly segmentation maps by measuring the discrepancy between the output feature maps of the teacher and student networks, from which an anomaly score is computed for sample-wise determination. We evaluate DSKD on three benchmark datasets and probe the effects of internal modules through ablation experiments. The results demonstrate that DSKD can achieve exceptional performance on small models like ResNet18 and effectively improve vanilla S-T networks.

Dual-Student Knowledge Distillation Networks for Unsupervised Anomaly Detection

TL;DR

Abstract

Paper Structure (24 sections, 10 equations, 9 figures, 9 tables, 2 algorithms)

This paper contains 24 sections, 10 equations, 9 figures, 9 tables, 2 algorithms.

Introduction
Related Works
Classical Methods
Reconstruction-based Methods
Deep Feature Embedding-based Methods
Proposed Method
Dual-Student Knowledge Distillation
Deep Feature Embedding
Anomaly Inference
Experimental results
Experiment Settings
Datasets
Implement and Environment Details
Evaluation Criteria
Results and Discussion
...and 9 more sections

Figures (9)

Figure 1: Examples of anomaly localization or segmentation. From the first to the third line, anomaly samples, anomaly maps generated by our proposed model, and ground truth are shown, respectively.
Figure 2: Principles of knowledge distillation in anomaly detection. In the left figure, the student and the teacher have similar representations of anomaly-free patterns but differ significantly in anomalies. Given this, we can calculate the anomaly scores in the right figure.
Figure 3: The framework of the proposed dual-student knowledge distillation (DSKD) model.
Figure 4: Deep feature embedding process. Feature maps from different layers are resized to the same scale and then are downsampled by convolutional modules. The embedding carries rich semantic information from different intermediate layers.
Figure 5: Visualization of anomaly localization results on 15 categories in MVTec AD. The first, second and third rows respectively represent the original defective images, anomaly maps and ground truth data
...and 4 more figures

Dual-Student Knowledge Distillation Networks for Unsupervised Anomaly Detection

TL;DR

Abstract

Dual-Student Knowledge Distillation Networks for Unsupervised Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (9)