Dual Distillation for Few-Shot Anomaly Detection

Le Dong; Qinzhong Tan; Chunlei Li; Jingliang Hu; Yilei Shi; Weisheng Dong; Xiao Xiang Zhu; Lichao Mou

Dual Distillation for Few-Shot Anomaly Detection

Le Dong, Qinzhong Tan, Chunlei Li, Jingliang Hu, Yilei Shi, Weisheng Dong, Xiao Xiang Zhu, Lichao Mou

TL;DR

D$2$4FAD is introduced, a novel dual distillation framework for few-shot anomaly detection that identifies anomalies in previously unseen tasks using only a small number of normal reference images and proposes a learn-to-weight mechanism that dynamically assesses the reference value of each support image conditioned on the query, optimizing anomaly detection performance.

Abstract

Anomaly detection is a critical task in computer vision with profound implications for medical imaging, where identifying pathologies early can directly impact patient outcomes. While recent unsupervised anomaly detection approaches show promise, they require substantial normal training data and struggle to generalize across anatomical contexts. We introduce D$^2$4FAD, a novel dual distillation framework for few-shot anomaly detection that identifies anomalies in previously unseen tasks using only a small number of normal reference images. Our approach leverages a pre-trained encoder as a teacher network to extract multi-scale features from both support and query images, while a student decoder learns to distill knowledge from the teacher on query images and self-distill on support images. We further propose a learn-to-weight mechanism that dynamically assesses the reference value of each support image conditioned on the query, optimizing anomaly detection performance. To evaluate our method, we curate a comprehensive benchmark dataset comprising 13,084 images across four organs, four imaging modalities, and five disease categories. Extensive experiments demonstrate that D$^2$4FAD significantly outperforms existing approaches, establishing a new state-of-the-art in few-shot medical anomaly detection. Code is available at https://github.com/ttttqz/D24FAD.

Dual Distillation for Few-Shot Anomaly Detection

TL;DR

4FAD is introduced, a novel dual distillation framework for few-shot anomaly detection that identifies anomalies in previously unseen tasks using only a small number of normal reference images and proposes a learn-to-weight mechanism that dynamically assesses the reference value of each support image conditioned on the query, optimizing anomaly detection performance.

Abstract

4FAD, a novel dual distillation framework for few-shot anomaly detection that identifies anomalies in previously unseen tasks using only a small number of normal reference images. Our approach leverages a pre-trained encoder as a teacher network to extract multi-scale features from both support and query images, while a student decoder learns to distill knowledge from the teacher on query images and self-distill on support images. We further propose a learn-to-weight mechanism that dynamically assesses the reference value of each support image conditioned on the query, optimizing anomaly detection performance. To evaluate our method, we curate a comprehensive benchmark dataset comprising 13,084 images across four organs, four imaging modalities, and five disease categories. Extensive experiments demonstrate that D

4FAD significantly outperforms existing approaches, establishing a new state-of-the-art in few-shot medical anomaly detection. Code is available at https://github.com/ttttqz/D24FAD.

Paper Structure (34 sections, 9 equations, 6 figures, 7 tables)

This paper contains 34 sections, 9 equations, 6 figures, 7 tables.

Introduction
Methodology
Problem Definition
Dual Distillation
Teacher-Student Distillation
Student Self-Distillation
Training Objective
Learn-to-Weight
Anomaly Scoring
Related Work
Experiments
Experimental Settings
Datasets
Implementation Details
Competing Methods
...and 19 more sections

Figures (6)

Figure 1: Overview of our dual distillation framework for few-shot anomaly detection. The architecture incorporates a frozen pre-trained teacher encoder and a learnable student decoder. During training, the teacher encoder processes both query and support images, while the student learns to reconstruct multi-scale feature representations through our proposed dual distillation approach. At inference time, for previously unseen tasks, anomalies are identified by analyzing discrepancies between query and support image features in the student network. In addition, we introduce a learn-to-weight mechanism that enhances model performance by dynamically assessing the reference value of each support image relative to a specific query (cf. Section \ref{['Learn-to-Weight']}).
Figure 2: Distribution of abnormality scores for normal (blue) and abnormal (red) samples across five datasets, visualized using raincloud plots and boxplots. For each dataset, the left subplot presents results without the learn-to-weight mechanism, while the right subplot shows results with the mechanism applied. Reduced overlap between red and blue distributions indicates superior discrimination performance. Each boxplot displays the median value and interquartile range (IQR), with whiskers extending to the extrema within 1.5$\times$ IQR from the quartiles, illustrating the significance of our approach.
Figure 3: Performance comparison of anomaly detection methods across three dimensions: AUROC score (vertical axis), inference time (horizontal axis), and memory footprint (circle radius).
Figure 4: t-SNE visualization of embeddings from normal and abnormal samples in the RSNA dataset, extracted from the student network. Left: embeddings without teacher-student distillation. Right: embeddings with teacher-student distillation applied.
Figure 5: Visualization of exemplar anomaly maps generated by the proposed model.
...and 1 more figures

Dual Distillation for Few-Shot Anomaly Detection

TL;DR

Abstract

Dual Distillation for Few-Shot Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)