Leveraging Clinical Text and Class Conditioning for 3D Prostate MRI Generation
Emerson P. Grabke, Babak Taati, Masoom A. Haider
TL;DR
This work tackles data scarcity in medical image synthesis by introducing CCELLA, a dual-head adapter that jointly conditions a latent diffusion model on full radiology reports and PI-RADS-derived class information. Trained in a data-efficient pipeline that reuses pretrained components, CCELLA demonstrates state-of-the-art 3D prostate MRI generation (3D FID = $0.025$) and benefits downstream cancer classification when synthetic images augment real data. The method blends an LLM-based text embedding with a class-conditioning timestep, optimized by a joint loss that balances image fidelity and radiology classification accuracy. results show improved synthetic image quality and classifier performance under limited data and minimal annotation, suggesting CCELLA can enhance medical LDM accessibility and utility while mitigating data requirements.
Abstract
Objective: Latent diffusion models (LDM) could alleviate data scarcity challenges affecting machine learning development for medical imaging. However, medical LDM strategies typically rely on short-prompt text encoders, nonmedical LDMs, or large data volumes. These strategies can limit performance and scientific accessibility. We propose a novel LDM conditioning approach to address these limitations. Methods: We propose Class-Conditioned Efficient Large Language model Adapter (CCELLA), a novel dual-head conditioning approach that simultaneously conditions the LDM U-Net with free-text clinical reports and radiology classification. We also propose a data-efficient LDM pipeline centered around CCELLA and a proposed joint loss function. We first evaluate our method on 3D prostate MRI against state-of-the-art. We then augment a downstream classifier model training dataset with synthetic images from our method. Results: Our method achieves a 3D FID score of 0.025 on a size-limited 3D prostate MRI dataset, significantly outperforming a recent foundation model with FID 0.070. When training a classifier for prostate cancer prediction, adding synthetic images generated by our method during training improves classifier accuracy from 69% to 74% and outperforms classifiers trained on images generated by prior state-of-the-art. Classifier training solely on our method's synthetic images achieved comparable performance to real image training. Conclusion: We show that our method improved both synthetic image quality and downstream classifier performance using limited data and minimal human annotation. Significance: The proposed CCELLA-centric pipeline enables radiology report and class-conditioned LDM training for high-quality medical image synthesis given limited data volume and human data annotation, improving LDM performance and scientific accessibility.
