PDD: Manifold-Prior Diverse Distillation for Medical Anomaly Detection

Xijun Lu; Hongying Liu; Fanhua Shang; Yanming Hui; Liang Wan

PDD: Manifold-Prior Diverse Distillation for Medical Anomaly Detection

Xijun Lu, Hongying Liu, Fanhua Shang, Yanming Hui, Liang Wan

TL;DR

PDD (Manifold-Prior Diverse Distillation), a framework that unifies dual-teacher priors into a shared high-dimensional manifold and distills this knowledge into dual students with complementary behaviors, significantly outperforms existing state-of-the-art methods in medical image anomaly detection.

Abstract

Medical image anomaly detection faces unique challenges due to subtle, heterogeneous anomalies embedded in complex anatomical structures. Through systematic Grad-CAM analysis, we reveal that discriminative activation maps fail on medical data, unlike their success on industrial datasets, motivating the need for manifold-level modeling. We propose PDD (Manifold-Prior Diverse Distillation), a framework that unifies dual-teacher priors into a shared high-dimensional manifold and distills this knowledge into dual students with complementary behaviors. Specifically, frozen VMamba-Tiny and wide-ResNet50 encoders provide global contextual and local structural priors, respectively. Their features are unified through a Manifold Matching and Unification (MMU) module, while an Inter-Level Feature Adaption (InA) module enriches intermediate representations. The unified manifold is distilled into two students: one performs layer-wise distillation via InA for local consistency, while the other receives skip-projected representations through a Manifold Prior Affine (MPA) module to capture cross-layer dependencies. A diversity loss prevents representation collapse while maintaining detection sensitivity. Extensive experiments on multiple medical datasets demonstrate that PDD significantly outperforms existing state-of-the-art methods, achieving improvements of up to 11.8%, 5.1%, and 8.5% in AUROC on HeadCT, BrainMRI, and ZhangLab datasets, respectively, and 3.4% in F1 max on the Uni-Medical dataset, establishing new state-of-the-art performance in medical image anomaly detection. The implementation will be released at https://github.com/OxygenLu/PDD

PDD: Manifold-Prior Diverse Distillation for Medical Anomaly Detection

TL;DR

Abstract

Paper Structure (15 sections, 9 equations, 5 figures, 5 tables)

This paper contains 15 sections, 9 equations, 5 figures, 5 tables.

Introduction
Related Work
Visual anomaly detection
Methods Based on Knowledge Distillation
Methodology
Manifold-Unified Reverse Distillation
Intra-Backbone Feature Fusion Strategy
Manifold Space Unification of Heterogeneous Backbones
Normal Pattern Representation Diversification
Experiments
Dataset and Experimental Setup
Comparative Experiment
Ablation Study
Anomaly Localization
Conclusion

Figures (5)

Figure 1: Grad-CAM visualization of frozen Vmamba and ResNet across medical and industrial images. Within each group, feature maps progress from low-dimensional to high-dimensional feature representations (top to bottom). At the same feature dimension, the dispersed and aggregated activation patterns of the two backbones form complementary prior information.
Figure 2: Overview of the proposed PDD framework. The framework employs a dual-teacher and dual-student architecture. The teachers consist of frozen VMamba-Tiny and frozen wide-ResNet50 encoders, whose intermediate features are fused via the InA module (shown in (b)) to obtain $f_{b}^i$. The two teacher encoders compress input images into distinct high-dimensional manifold spaces, which are then aligned through the MMU module. The aligned features are fed into two student networks: Student 1 distills features from InA via $\mathcal{F}_{E_{u}}^i$, while Student 2 incorporates multi-scale manifold space features from the unified manifold through MLP-based skip connections to $\mathcal{F}_{E_{p}}^i$, distilling both prior and InA features. This enables diverse reconstruction of normal samples and effective separation of anomalies.
Figure 3: Visualization of PDD pipeline and loss functions. Figure (a) illustrates the training process of PDD, which is trained exclusively on normal samples. The training simultaneously optimizes three loss functions: $l_{prp}$, $l_{kr}$, and $l_{div}$. Figure (c) shows the loss function curves over 100 epochs on the ZhangLab Chest-Xray dataset. The combined direction of the three curves is represented in the simulated loss landscape in Figure (d), with arrows indicating the direction of loss descent. In the final 100 epochs, the training converges near the optimal solution, demonstrating that PDD effectively learns features of normal samples and can separate anomalies.
Figure 4: Anomaly localization comparison between PDD and Skip-TS on the HeadCT dataset.
Figure 5: Anomaly localization comparison between PDD and RD4AD on the ZhangLab dataset. PDD produces significantly fewer false positives on normal samples, demonstrating stronger specificity.

PDD: Manifold-Prior Diverse Distillation for Medical Anomaly Detection

TL;DR

Abstract

PDD: Manifold-Prior Diverse Distillation for Medical Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (5)