Table of Contents
Fetching ...

MedIAnomaly: A comparative study of anomaly detection in medical images

Yu Cai, Weiwen Zhang, Hao Chen, Kwang-Ting Cheng

TL;DR

MedIAnomaly addresses the lack of fair evaluation in medical anomaly detection by introducing a unified benchmark across seven datasets and five modalities. It conducts a large-scale, fair comparison of 30 methods spanning reconstruction-based, self-supervised, and feature-reference-based AD, using image-level and pixel-level tasks with threshold-independent metrics. The study analyzes the influence of core components—latent-space configuration, reconstruction error distance, DDPM denoising, and ImageNet pretraining—and reveals nuanced insights: reconstruction-based methods can outperform SSL without pretrained weights, while SSL often benefits from realistic synthetic anomalies and two-stage setups; ImageNet weights provide strong baselines but require careful adaptation for medical data. The results yield practical guidance for practitioners and chart clear directions for future research, including learning task-specific distance metrics, adaptive latent-space management, and exploring 3D anomaly detection and vision-language models in medical AD.

Abstract

Anomaly detection (AD) aims at detecting abnormal samples that deviate from the expected normal patterns. Generally, it can be trained merely on normal data, without a requirement for abnormal samples, and thereby plays an important role in rare disease recognition and health screening in the medical domain. Despite the emergence of numerous methods for medical AD, the lack of a fair and comprehensive evaluation causes ambiguous conclusions and hinders the development of this field. To address this problem, this paper builds a benchmark with unified comparison. Seven medical datasets with five image modalities, including chest X-rays, brain MRIs, retinal fundus images, dermatoscopic images, and histopathology images, are curated for extensive evaluation. Thirty typical AD methods, including reconstruction and self-supervised learning-based methods, are involved in comparison of image-level anomaly classification and pixel-level anomaly segmentation. Furthermore, for the first time, we systematically investigate the effect of key components in existing methods, revealing unresolved challenges and potential future directions. The datasets and code are available at https://github.com/caiyu6666/MedIAnomaly.

MedIAnomaly: A comparative study of anomaly detection in medical images

TL;DR

MedIAnomaly addresses the lack of fair evaluation in medical anomaly detection by introducing a unified benchmark across seven datasets and five modalities. It conducts a large-scale, fair comparison of 30 methods spanning reconstruction-based, self-supervised, and feature-reference-based AD, using image-level and pixel-level tasks with threshold-independent metrics. The study analyzes the influence of core components—latent-space configuration, reconstruction error distance, DDPM denoising, and ImageNet pretraining—and reveals nuanced insights: reconstruction-based methods can outperform SSL without pretrained weights, while SSL often benefits from realistic synthetic anomalies and two-stage setups; ImageNet weights provide strong baselines but require careful adaptation for medical data. The results yield practical guidance for practitioners and chart clear directions for future research, including learning task-specific distance metrics, adaptive latent-space management, and exploring 3D anomaly detection and vision-language models in medical AD.

Abstract

Anomaly detection (AD) aims at detecting abnormal samples that deviate from the expected normal patterns. Generally, it can be trained merely on normal data, without a requirement for abnormal samples, and thereby plays an important role in rare disease recognition and health screening in the medical domain. Despite the emergence of numerous methods for medical AD, the lack of a fair and comprehensive evaluation causes ambiguous conclusions and hinders the development of this field. To address this problem, this paper builds a benchmark with unified comparison. Seven medical datasets with five image modalities, including chest X-rays, brain MRIs, retinal fundus images, dermatoscopic images, and histopathology images, are curated for extensive evaluation. Thirty typical AD methods, including reconstruction and self-supervised learning-based methods, are involved in comparison of image-level anomaly classification and pixel-level anomaly segmentation. Furthermore, for the first time, we systematically investigate the effect of key components in existing methods, revealing unresolved challenges and potential future directions. The datasets and code are available at https://github.com/caiyu6666/MedIAnomaly.
Paper Structure (40 sections, 11 figures, 14 tables)

This paper contains 40 sections, 11 figures, 14 tables.

Figures (11)

  • Figure 1: Overview of reconstruction-based anomaly detection. The reconstruction model is trained to minimize reconstruction loss on normal images. During inference, lesions in abnormal images are assumed unable to be reconstructed by the trained model, and in turn yield a high reconstruction error.
  • Figure 2: Overview of the two paradigms for self-supervised anomaly detection. (a) The one-stage approach trains a model to detect manually synthetic anomalies, and directly applies this model to detect real anomalies. (b) The two-stage approach firstly learns self-supervised representations through a pretext task on the normal training data, and then builds a one-class classifier on the learned representations.
  • Figure 3: Examples of datasets deemed too simple for AD, including the Hyper-Kvasir and OCT2017 datasets.
  • Figure 4: Statistics of slice indices in our processed BraTS2021 dataset. A central crop was performed before slice extraction, resulting in index 0 corresponding to index 50 of the original volume and index 69 corresponding to index 119 of the original volume.
  • Figure 5: Examples of the collected datasets for AD. Samples enclosed by the green dashed circle are normal, while others are abnormal.
  • ...and 6 more figures