Table of Contents
Fetching ...

Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection

Chunlei Li, Yilei Shi, Jingliang Hu, Xiao Xiang Zhu, Lichao Mou

TL;DR

We address unsupervised medical anomaly detection by introducing a scale-aware contrastive reverse distillation framework that uses a clean and a noisy teacher alongside a student decoder. A scale adaptation module learns per-scale weights to handle anomaly size variation, while simplex-noise-based anomaly synthesis drives discriminative, out-of-normal representations. Empirical results on RSNA, Brain Tumor MRI, and ISIC 2018 show state-of-the-art performance across AUC, F1, and accuracy, with ablations confirming the contributions of CRD and scale weighting. The approach offers practical benefits for medical imaging, including efficient inference and robust handling of multi-scale anomalies, and provides a codebase for reproducibility.

Abstract

Unsupervised anomaly detection using deep learning has garnered significant research attention due to its broad applicability, particularly in medical imaging where labeled anomalous data are scarce. While earlier approaches leverage generative models like autoencoders and generative adversarial networks (GANs), they often fall short due to overgeneralization. Recent methods explore various strategies, including memory banks, normalizing flows, self-supervised learning, and knowledge distillation, to enhance discrimination. Among these, knowledge distillation, particularly reverse distillation, has shown promise. Following this paradigm, we propose a novel scale-aware contrastive reverse distillation model that addresses two key limitations of existing reverse distillation methods: insufficient feature discriminability and inability to handle anomaly scale variations. Specifically, we introduce a contrastive student-teacher learning approach to derive more discriminative representations by generating and exploring out-of-normal distributions. Further, we design a scale adaptation mechanism to softly weight contrastive distillation losses at different scales to account for the scale variation issue. Extensive experiments on benchmark datasets demonstrate state-of-the-art performance, validating the efficacy of the proposed method. Code is available at https://github.com/MedAITech/SCRD4AD.

Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection

TL;DR

We address unsupervised medical anomaly detection by introducing a scale-aware contrastive reverse distillation framework that uses a clean and a noisy teacher alongside a student decoder. A scale adaptation module learns per-scale weights to handle anomaly size variation, while simplex-noise-based anomaly synthesis drives discriminative, out-of-normal representations. Empirical results on RSNA, Brain Tumor MRI, and ISIC 2018 show state-of-the-art performance across AUC, F1, and accuracy, with ablations confirming the contributions of CRD and scale weighting. The approach offers practical benefits for medical imaging, including efficient inference and robust handling of multi-scale anomalies, and provides a codebase for reproducibility.

Abstract

Unsupervised anomaly detection using deep learning has garnered significant research attention due to its broad applicability, particularly in medical imaging where labeled anomalous data are scarce. While earlier approaches leverage generative models like autoencoders and generative adversarial networks (GANs), they often fall short due to overgeneralization. Recent methods explore various strategies, including memory banks, normalizing flows, self-supervised learning, and knowledge distillation, to enhance discrimination. Among these, knowledge distillation, particularly reverse distillation, has shown promise. Following this paradigm, we propose a novel scale-aware contrastive reverse distillation model that addresses two key limitations of existing reverse distillation methods: insufficient feature discriminability and inability to handle anomaly scale variations. Specifically, we introduce a contrastive student-teacher learning approach to derive more discriminative representations by generating and exploring out-of-normal distributions. Further, we design a scale adaptation mechanism to softly weight contrastive distillation losses at different scales to account for the scale variation issue. Extensive experiments on benchmark datasets demonstrate state-of-the-art performance, validating the efficacy of the proposed method. Code is available at https://github.com/MedAITech/SCRD4AD.

Paper Structure

This paper contains 27 sections, 4 equations, 7 figures, 5 tables, 2 algorithms.

Figures (7)

  • Figure 1: Illustration of the proposed framework during training. It comprises two distinct encoding pathways: 1) a "clean" teacher encoder followed by a bottleneck, a scale adaptation mechanism, and a student decoder, and 2) a "noisy" teacher encoder. The two teacher encoders share weights but process different inputs: the clean teacher receives normal data, whereas the noisy teacher processes synthesized anomalies. We employ contrastive reverse distillation by pushing the student's reconstructed features closer to feature representations from the clean teacher and farther from those of the noisy teacher. The scale adaptation module generates input-specific scale weights used in this process.
  • Figure 2: Comparison of anomaly score distributions for normal (blue) and abnormal (red) samples in the test sets across datasets. Top: Distributions obtained from the baseline RD4AD DengL22. Bottom: Distributions generated by our proposed model. Scores are normalized to [0,1] for each subfigure to enable direct comparison. Our approach demonstrates enhanced separation, leading to improved anomaly detection performance.
  • Figure 3: Effect of $\lambda$ on model performance on the RSNA dataset. $\lambda$ ranges from 0.1 to 0.5 in 0.1 increments, with higher values corresponding to increased simplex noise levels. Performance metrics (AUC, F1, and ACC) are shown as $\lambda$ increases left to right.
  • Figure 4: Performance comparison of anomaly detection methods. Evaluation metrics include: AUROC (vertical axis), inference time (horizontal axis), and memory footprint (circle radius). Our method achieves state-of-the-art performance, delivering the highest AUROC while demonstrating superior computational efficiency. Specifically, our approach is 6x faster than PatchCore, 4x faster than GAN Ensemble, 3x faster than CFlow, and 2x faster than NSA.
  • Figure 5: Comparison of ROC curves between our method and the top 5 anomaly detection methods on the RSNA dataset.
  • ...and 2 more figures