Table of Contents
Fetching ...

Leveraging Clinical Text and Class Conditioning for 3D Prostate MRI Generation

Emerson P. Grabke, Babak Taati, Masoom A. Haider

TL;DR

This work tackles data scarcity in medical image synthesis by introducing CCELLA, a dual-head adapter that jointly conditions a latent diffusion model on full radiology reports and PI-RADS-derived class information. Trained in a data-efficient pipeline that reuses pretrained components, CCELLA demonstrates state-of-the-art 3D prostate MRI generation (3D FID = $0.025$) and benefits downstream cancer classification when synthetic images augment real data. The method blends an LLM-based text embedding with a class-conditioning timestep, optimized by a joint loss that balances image fidelity and radiology classification accuracy. results show improved synthetic image quality and classifier performance under limited data and minimal annotation, suggesting CCELLA can enhance medical LDM accessibility and utility while mitigating data requirements.

Abstract

Objective: Latent diffusion models (LDM) could alleviate data scarcity challenges affecting machine learning development for medical imaging. However, medical LDM strategies typically rely on short-prompt text encoders, nonmedical LDMs, or large data volumes. These strategies can limit performance and scientific accessibility. We propose a novel LDM conditioning approach to address these limitations. Methods: We propose Class-Conditioned Efficient Large Language model Adapter (CCELLA), a novel dual-head conditioning approach that simultaneously conditions the LDM U-Net with free-text clinical reports and radiology classification. We also propose a data-efficient LDM pipeline centered around CCELLA and a proposed joint loss function. We first evaluate our method on 3D prostate MRI against state-of-the-art. We then augment a downstream classifier model training dataset with synthetic images from our method. Results: Our method achieves a 3D FID score of 0.025 on a size-limited 3D prostate MRI dataset, significantly outperforming a recent foundation model with FID 0.070. When training a classifier for prostate cancer prediction, adding synthetic images generated by our method during training improves classifier accuracy from 69% to 74% and outperforms classifiers trained on images generated by prior state-of-the-art. Classifier training solely on our method's synthetic images achieved comparable performance to real image training. Conclusion: We show that our method improved both synthetic image quality and downstream classifier performance using limited data and minimal human annotation. Significance: The proposed CCELLA-centric pipeline enables radiology report and class-conditioned LDM training for high-quality medical image synthesis given limited data volume and human data annotation, improving LDM performance and scientific accessibility.

Leveraging Clinical Text and Class Conditioning for 3D Prostate MRI Generation

TL;DR

This work tackles data scarcity in medical image synthesis by introducing CCELLA, a dual-head adapter that jointly conditions a latent diffusion model on full radiology reports and PI-RADS-derived class information. Trained in a data-efficient pipeline that reuses pretrained components, CCELLA demonstrates state-of-the-art 3D prostate MRI generation (3D FID = ) and benefits downstream cancer classification when synthetic images augment real data. The method blends an LLM-based text embedding with a class-conditioning timestep, optimized by a joint loss that balances image fidelity and radiology classification accuracy. results show improved synthetic image quality and classifier performance under limited data and minimal annotation, suggesting CCELLA can enhance medical LDM accessibility and utility while mitigating data requirements.

Abstract

Objective: Latent diffusion models (LDM) could alleviate data scarcity challenges affecting machine learning development for medical imaging. However, medical LDM strategies typically rely on short-prompt text encoders, nonmedical LDMs, or large data volumes. These strategies can limit performance and scientific accessibility. We propose a novel LDM conditioning approach to address these limitations. Methods: We propose Class-Conditioned Efficient Large Language model Adapter (CCELLA), a novel dual-head conditioning approach that simultaneously conditions the LDM U-Net with free-text clinical reports and radiology classification. We also propose a data-efficient LDM pipeline centered around CCELLA and a proposed joint loss function. We first evaluate our method on 3D prostate MRI against state-of-the-art. We then augment a downstream classifier model training dataset with synthetic images from our method. Results: Our method achieves a 3D FID score of 0.025 on a size-limited 3D prostate MRI dataset, significantly outperforming a recent foundation model with FID 0.070. When training a classifier for prostate cancer prediction, adding synthetic images generated by our method during training improves classifier accuracy from 69% to 74% and outperforms classifiers trained on images generated by prior state-of-the-art. Classifier training solely on our method's synthetic images achieved comparable performance to real image training. Conclusion: We show that our method improved both synthetic image quality and downstream classifier performance using limited data and minimal human annotation. Significance: The proposed CCELLA-centric pipeline enables radiology report and class-conditioned LDM training for high-quality medical image synthesis given limited data volume and human data annotation, improving LDM performance and scientific accessibility.

Paper Structure

This paper contains 15 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of CCELLA-centric pipeline (left) and CCELLA adapter (right). Image encoder ($E_{Image}$) and decoder ($D_{Image}$) from guoMAISIMedicalAI2025. Text encoder ($E_{Text}$) from chungScalingInstructionFinetunedLanguage2024. CCELLA consists of six CC-TSC blocks with classifier ($h_C$) and aligned text embedding ($h_T$) heads. TSC block originally from huELLAEquipDiffusion2024. Pipeline input includes the radiology report text for training and inference, with image and radiology classification additionally included for training alone. Example text redacted for privacy.
  • Figure 2: Four real axial T2 prostate MRI images and the synthetic MRI images generated by ELLAMAISI, PathLDM, and CCELLA conditioned on the same corresponding report text.