Harnessing EHRs for Diffusion-based Anomaly Detection on Chest X-rays
Harim Kim, Yuhan Wang, Minkyu Ahn, Heeyoul Choi, Yuyin Zhou, Charmgil Hong
TL;DR
This work tackles unsupervised anomaly detection in chest radiographs by addressing the limitation of diffusion-based methods that rely solely on imaging features. The authors propose Diff3M, a diffusion-based, multimodal framework that conditions the reverse diffusion process on structured Electronic Health Records (EHRs) via an Image-EHR Cross Attention (IECA) mechanism and reinforces robust normal-like reconstruction with a Pixel-level Checkerboard Masking (PCM) strategy. Training on normal data combines a Noise Prediction (NP) objective and a Masked Pixel Generation (MPG) reconstruction objective, formalized as $\mathcal{L}_{Diff3M} = \mathbb{E}_{t,\mathbf{x},\mathbf{c}_r,\epsilon}[ \lambda \|\epsilon_{\theta}^{(t)} - \epsilon^{(t)}\|_2^2 + (1-\lambda) \| \tilde{\mathbf{x}}_t - \mathbf{x}_t\|_2^2 ]$, with DDIM sampling used during inference for efficiency. Experiments on CheXpert and MIMIC-CXR/IV show state-of-the-art performance, demonstrating that incorporating EHR context via IECA improves anomaly detection and localization, particularly when richer EHR features (beyond demographics) are available. The study highlights BMI as a dominant EHR contributor in many cases, suggesting the clinical value of multimodal conditioning for distinguishing normal anatomical variation from pathology. Overall, Diff3M advances medical UAD by fusing imaging and structured clinical information to yield more reliable anomaly detection and interpretability in chest X-ray analysis.
Abstract
Unsupervised anomaly detection (UAD) in medical imaging is crucial for identifying pathological abnormalities without requiring extensive labeled data. However, existing diffusion-based UAD models rely solely on imaging features, limiting their ability to distinguish between normal anatomical variations and pathological anomalies. To address this, we propose Diff3M, a multi-modal diffusion-based framework that integrates chest X-rays and structured Electronic Health Records (EHRs) for enhanced anomaly detection. Specifically, we introduce a novel image-EHR cross-attention module to incorporate structured clinical context into the image generation process, improving the model's ability to differentiate normal from abnormal features. Additionally, we develop a static masking strategy to enhance the reconstruction of normal-like images from anomalies. Extensive evaluations on CheXpert and MIMIC-CXR/IV demonstrate that Diff3M achieves state-of-the-art performance, outperforming existing UAD methods in medical imaging. Our code is available at this http URL https://github.com/nth221/Diff3M
