Table of Contents
Fetching ...

RadGazeGen: Radiomics and Gaze-guided Medical Image Generation using Diffusion Models

Moinak Bhattacharya, Gagandeep Singh, Shubham Jain, Prateek Prasanna

Abstract

In this work, we present RadGazeGen, a novel framework for integrating experts' eye gaze patterns and radiomic feature maps as controls to text-to-image diffusion models for high fidelity medical image generation. Despite the recent success of text-to-image diffusion models, text descriptions are often found to be inadequate and fail to convey detailed disease-specific information to these models to generate clinically accurate images. The anatomy, disease texture patterns, and location of the disease are extremely important to generate realistic images; moreover the fidelity of image generation can have significant implications in downstream tasks involving disease diagnosis or treatment repose assessment. Hence, there is a growing need to carefully define the controls used in diffusion models for medical image generation. Eye gaze patterns of radiologists are important visuo-cognitive information, indicative of subtle disease patterns and spatial location. Radiomic features further provide important subvisual cues regarding disease phenotype. In this work, we propose to use these gaze patterns in combination with standard radiomics descriptors, as controls, to generate anatomically correct and disease-aware medical images. RadGazeGen is evaluated for image generation quality and diversity on the REFLACX dataset. To demonstrate clinical applicability, we also show classification performance on the generated images from the CheXpert test set (n=500) and long-tailed learning performance on the MIMIC-CXR-LT test set (n=23550).

RadGazeGen: Radiomics and Gaze-guided Medical Image Generation using Diffusion Models

Abstract

In this work, we present RadGazeGen, a novel framework for integrating experts' eye gaze patterns and radiomic feature maps as controls to text-to-image diffusion models for high fidelity medical image generation. Despite the recent success of text-to-image diffusion models, text descriptions are often found to be inadequate and fail to convey detailed disease-specific information to these models to generate clinically accurate images. The anatomy, disease texture patterns, and location of the disease are extremely important to generate realistic images; moreover the fidelity of image generation can have significant implications in downstream tasks involving disease diagnosis or treatment repose assessment. Hence, there is a growing need to carefully define the controls used in diffusion models for medical image generation. Eye gaze patterns of radiologists are important visuo-cognitive information, indicative of subtle disease patterns and spatial location. Radiomic features further provide important subvisual cues regarding disease phenotype. In this work, we propose to use these gaze patterns in combination with standard radiomics descriptors, as controls, to generate anatomically correct and disease-aware medical images. RadGazeGen is evaluated for image generation quality and diversity on the REFLACX dataset. To demonstrate clinical applicability, we also show classification performance on the generated images from the CheXpert test set (n=500) and long-tailed learning performance on the MIMIC-CXR-LT test set (n=23550).
Paper Structure (21 sections, 6 equations, 7 figures, 5 tables)

This paper contains 21 sections, 6 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: A. Baseline methods add generic controls (for example text) to the diffusion models for medical image generation, B. Our method, RadGazeGen, combines different radiomics filter maps and lung segmentation mask with radiologist's eye gaze patterns, and uses these controls for clinically accurate medical image generation.
  • Figure 2: Overview of RadGazeGen pipeline.A. Generated CXRs for different radiomics filter maps and lung segmentation mask as controls, B. Generated CXRs when more than one controls are fused, and C. Different components of Rad-CN and HVA-CN.
  • Figure 3: Overview of the Rad-CN and HVA-CN architecture. The pre-trained SD model is locked and the Rad-CN (shown in A) and HVA-CN (shown in B) are finetuned by making a trainable copy of the SD model.
  • Figure 4: Hypotheses generation. HVA maps are thresholded to create masks, which are then mapped to various disease pathologies to generate hypotheses.
  • Figure 5: Qualitative comparison. The generated CXRs for different baselines are shown along with the corresponding radiologist's text and the original image. We also show the radiologist's annotations of the disease occurrence (red bounding box).
  • ...and 2 more figures