Table of Contents
Fetching ...

SD-MAD: Sign-Driven Few-shot Multi-Anomaly Detection in Medical Images

Kaiyu Guo, Tan Pan, Chen Jiang, Zijian Wang, Brian C. Lovell, Limei Han, Yuan Cheng, Mahsa Baktashmotlagh

TL;DR

This work addresses the challenge of few-shot medical anomaly detection when multiple anomaly categories may co-occur in images. It introduces SD-MAD, a two-stage sign-driven framework that uses a CLIP-based backbone with Shift Adapters to align radiological signs with anomaly categories and amplify inter-anomaly discrepancies, while inference employs automatic sign selection to reduce intra-anomaly uncertainty. The method is evaluated on seven medical imaging datasets across three evaluation protocols, showing consistent improvements in multi-label prediction and category-wise discrimination compared to CLIP-based baselines. Sign selection helps mitigate noisy prompt–anomaly mappings, though some category-specific prompts remain challenging; the approach offers a practical path toward robust multi-anomaly detection in data-scarce clinical settings.

Abstract

Medical anomaly detection (AD) is crucial for early clinical intervention, yet it faces challenges due to limited access to high-quality medical imaging data, caused by privacy concerns and data silos. Few-shot learning has emerged as a promising approach to alleviate these limitations by leveraging the large-scale prior knowledge embedded in vision-language models (VLMs). Recent advancements in few-shot medical AD have treated normal and abnormal cases as a one-class classification problem, often overlooking the distinction among multiple anomaly categories. Thus, in this paper, we propose a framework tailored for few-shot medical anomaly detection in the scenario where the identification of multiple anomaly categories is required. To capture the detailed radiological signs of medical anomaly categories, our framework incorporates diverse textual descriptions for each category generated by a Large-Language model, under the assumption that different anomalies in medical images may share common radiological signs in each category. Specifically, we introduce SD-MAD, a two-stage Sign-Driven few-shot Multi-Anomaly Detection framework: (i) Radiological signs are aligned with anomaly categories by amplifying inter-anomaly discrepancy; (ii) Aligned signs are selected further to mitigate the effect of the under-fitting and uncertain-sample issue caused by limited medical data, employing an automatic sign selection strategy at inference. Moreover, we propose three protocols to comprehensively quantify the performance of multi-anomaly detection. Extensive experiments illustrate the effectiveness of our method.

SD-MAD: Sign-Driven Few-shot Multi-Anomaly Detection in Medical Images

TL;DR

This work addresses the challenge of few-shot medical anomaly detection when multiple anomaly categories may co-occur in images. It introduces SD-MAD, a two-stage sign-driven framework that uses a CLIP-based backbone with Shift Adapters to align radiological signs with anomaly categories and amplify inter-anomaly discrepancies, while inference employs automatic sign selection to reduce intra-anomaly uncertainty. The method is evaluated on seven medical imaging datasets across three evaluation protocols, showing consistent improvements in multi-label prediction and category-wise discrimination compared to CLIP-based baselines. Sign selection helps mitigate noisy prompt–anomaly mappings, though some category-specific prompts remain challenging; the approach offers a practical path toward robust multi-anomaly detection in data-scarce clinical settings.

Abstract

Medical anomaly detection (AD) is crucial for early clinical intervention, yet it faces challenges due to limited access to high-quality medical imaging data, caused by privacy concerns and data silos. Few-shot learning has emerged as a promising approach to alleviate these limitations by leveraging the large-scale prior knowledge embedded in vision-language models (VLMs). Recent advancements in few-shot medical AD have treated normal and abnormal cases as a one-class classification problem, often overlooking the distinction among multiple anomaly categories. Thus, in this paper, we propose a framework tailored for few-shot medical anomaly detection in the scenario where the identification of multiple anomaly categories is required. To capture the detailed radiological signs of medical anomaly categories, our framework incorporates diverse textual descriptions for each category generated by a Large-Language model, under the assumption that different anomalies in medical images may share common radiological signs in each category. Specifically, we introduce SD-MAD, a two-stage Sign-Driven few-shot Multi-Anomaly Detection framework: (i) Radiological signs are aligned with anomaly categories by amplifying inter-anomaly discrepancy; (ii) Aligned signs are selected further to mitigate the effect of the under-fitting and uncertain-sample issue caused by limited medical data, employing an automatic sign selection strategy at inference. Moreover, we propose three protocols to comprehensively quantify the performance of multi-anomaly detection. Extensive experiments illustrate the effectiveness of our method.

Paper Structure

This paper contains 15 sections, 11 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Figures (a) and (b) visualize the difference between our task and previous tasks. Figures (c) and (d) explain a multi-anomaly scenario, and radiological signs of different medical anomalies in the Brain MRI.
  • Figure 2: The pipeline of SD-MAD. In the framework, the training phase is designed to amplify inter-anomaly discrepancies, and the inference stage aims to handle the uncertain-sample problem in each anomaly category.
  • Figure 3: The ablation study on $\lambda$. We conduct the experiments on the multi-label prediction task with two metrics, namely Hamming score and Subset accuracy.
  • Figure 4: Visualization of image-text similarity heatmaps. (a) visualizes the heatmap on vanilla CLIP, (b) visualizes the heatmap on our trained model. The correspondence between prompts and anomaly categories is provided in (c).

Theorems & Definitions (3)

  • Remark 3.1
  • Definition 3.2
  • Remark 3.3