Table of Contents
Fetching ...

MAEDiff: Masked Autoencoder-enhanced Diffusion Models for Unsupervised Anomaly Detection in Brain Images

Rui Xu, Yunke Wang, Bo Du

TL;DR

MAEDiff tackles unsupervised anomaly detection in brain MRI by reconstructing healthy references through a hierarchical patch-based diffusion framework. It introduces a MAE-enhanced diffusion U-Net that processes overlapping upper-level patches while applying a masked autoencoder mechanism on sub-level grids to condition on visible regions, with outputs averaged across patches during inference. Experiments on BraTS21 and MS lesions demonstrate superior anomaly detection performance (Dice and AUPRC) and solid reconstruction quality compared to patch-based DDPM baselines and other methods. The approach advances unsupervised medical image anomaly detection by better leveraging global context and fine-grained brain anatomy, with potential for lesion-aware patch selection and cross-domain generalization.

Abstract

Unsupervised anomaly detection has gained significant attention in the field of medical imaging due to its capability of relieving the costly pixel-level annotation. To achieve this, modern approaches usually utilize generative models to produce healthy references of the diseased images and then identify the abnormalities by comparing the healthy references and the original diseased images. Recently, diffusion models have exhibited promising potential for unsupervised anomaly detection in medical images for their good mode coverage and high sample quality. However, the intrinsic characteristics of the medical images, e.g. the low contrast, and the intricate anatomical structure of the human body make the reconstruction challenging. Besides, the global information of medical images often remain underutilized. To address these two issues, we propose a novel Masked Autoencoder-enhanced Diffusion Model (MAEDiff) for unsupervised anomaly detection in brain images. The MAEDiff involves a hierarchical patch partition. It generates healthy images by overlapping upper-level patches and implements a mechanism based on the masked autoencoders operating on the sub-level patches to enhance the condition on the unnoised regions. Extensive experiments on data of tumors and multiple sclerosis lesions demonstrate the effectiveness of our method.

MAEDiff: Masked Autoencoder-enhanced Diffusion Models for Unsupervised Anomaly Detection in Brain Images

TL;DR

MAEDiff tackles unsupervised anomaly detection in brain MRI by reconstructing healthy references through a hierarchical patch-based diffusion framework. It introduces a MAE-enhanced diffusion U-Net that processes overlapping upper-level patches while applying a masked autoencoder mechanism on sub-level grids to condition on visible regions, with outputs averaged across patches during inference. Experiments on BraTS21 and MS lesions demonstrate superior anomaly detection performance (Dice and AUPRC) and solid reconstruction quality compared to patch-based DDPM baselines and other methods. The approach advances unsupervised medical image anomaly detection by better leveraging global context and fine-grained brain anatomy, with potential for lesion-aware patch selection and cross-domain generalization.

Abstract

Unsupervised anomaly detection has gained significant attention in the field of medical imaging due to its capability of relieving the costly pixel-level annotation. To achieve this, modern approaches usually utilize generative models to produce healthy references of the diseased images and then identify the abnormalities by comparing the healthy references and the original diseased images. Recently, diffusion models have exhibited promising potential for unsupervised anomaly detection in medical images for their good mode coverage and high sample quality. However, the intrinsic characteristics of the medical images, e.g. the low contrast, and the intricate anatomical structure of the human body make the reconstruction challenging. Besides, the global information of medical images often remain underutilized. To address these two issues, we propose a novel Masked Autoencoder-enhanced Diffusion Model (MAEDiff) for unsupervised anomaly detection in brain images. The MAEDiff involves a hierarchical patch partition. It generates healthy images by overlapping upper-level patches and implements a mechanism based on the masked autoencoders operating on the sub-level patches to enhance the condition on the unnoised regions. Extensive experiments on data of tumors and multiple sclerosis lesions demonstrate the effectiveness of our method.
Paper Structure (24 sections, 8 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 24 sections, 8 equations, 3 figures, 2 tables, 2 algorithms.

Figures (3)

  • Figure 1: Overall pipeline of the proposed Masked Autoencoder-enhanced Diffusion Model (MAEDiff), where a hierarchical partition strategy is utilized. The input image $\boldsymbol{x}_{0}$ is first divided into larger upper-level $p \times p$ patches, and then further into smaller sub-level $r \times r$ grids. (a) In the training phase, one patch is randomly selected for patch-wise reconstruction. Specifically, the selected patch is diffused by the forward process, and reconstructed by the reverse process using the partially perturbed $\tilde{\boldsymbol{x}}_{t}$. The sub-level division mainly works on the feature map to enhance the condition on the visible (unnoised) region $\hat{\boldsymbol{x}}_{0}$. (b) In the testing phase, the patch-wise reconstruction is performed across the image by sliding horizontally and vertically at a step size of $s$. The anomaly score map is obtained by comparing the original diseased image with the reconstructed healthy reference pixel by pixel.
  • Figure 2: The generator architecture of our MAEDiff, which is a typical diffusion U-Net integrated with an MAE-like mechanism. Thereinto, the feature map extracted by the U-Net encoder is fed into the MAE module. The visible feature map is processed by the MAE encoder, and then processed along with the entire feature map by the MAE decoder. The output of the MAE module is reshaped and upsampled to be compatible with the remaining U-Net.
  • Figure 3: Qualitative comparison of our MAEDiff and previous approaches. Columns (a)-(d) display the reconstruction results and the anomaly score maps for the AnoDDPM, pDDPM (s48), pDDPM (s16), and our MAEDiff, respectively. Column GT presents the original images and their corresponding ground truth annotations for reference.