Table of Contents
Fetching ...

Fake It Till You Make It: Using Synthetic Data and Domain Knowledge for Improved Text-Based Learning for LGE Detection

Athira J Jacob, Puneet Sharma, Daniel Rueckert

TL;DR

This paper tackles the challenge of detecting myocardial hyperenhancement in LGE MRI when annotated data are scarce. It introduces a text-guided, CLIP-based framework augmented with synthetic scar generation and anatomy-informed image normalization to learn from clinical reports alone. Key contributions include a scalable synthetic-data pipeline, a caption loss for fine-grained supervision, and pretraining the vision encoder on a related LGE task, achieving a balanced accuracy of 0.83 on held-out data. The approach enables slice-level interpretability and shows promise for efficient, data-sparse deployment in clinical settings.

Abstract

Detection of hyperenhancement from cardiac LGE MRI images is a complex task requiring significant clinical expertise. Although deep learning-based models have shown promising results for the task, they require large amounts of data with fine-grained annotations. Clinical reports generated for cardiac MR studies contain rich, clinically relevant information, including the location, extent and etiology of any scars present. Although recently developed CLIP-based training enables pretraining models with image-text pairs, it requires large amounts of data and further finetuning strategies on downstream tasks. In this study, we use various strategies rooted in domain knowledge to train a model for LGE detection solely using text from clinical reports, on a relatively small clinical cohort of 965 patients. We improve performance through the use of synthetic data augmentation, by systematically creating scar images and associated text. In addition, we standardize the orientation of the images in an anatomy-informed way to enable better alignment of spatial and text features. We also use a captioning loss to enable fine-grained supervision and explore the effect of pretraining of the vision encoder on performance. Finally, ablation studies are carried out to elucidate the contributions of each design component to the overall performance of the model.

Fake It Till You Make It: Using Synthetic Data and Domain Knowledge for Improved Text-Based Learning for LGE Detection

TL;DR

This paper tackles the challenge of detecting myocardial hyperenhancement in LGE MRI when annotated data are scarce. It introduces a text-guided, CLIP-based framework augmented with synthetic scar generation and anatomy-informed image normalization to learn from clinical reports alone. Key contributions include a scalable synthetic-data pipeline, a caption loss for fine-grained supervision, and pretraining the vision encoder on a related LGE task, achieving a balanced accuracy of 0.83 on held-out data. The approach enables slice-level interpretability and shows promise for efficient, data-sparse deployment in clinical settings.

Abstract

Detection of hyperenhancement from cardiac LGE MRI images is a complex task requiring significant clinical expertise. Although deep learning-based models have shown promising results for the task, they require large amounts of data with fine-grained annotations. Clinical reports generated for cardiac MR studies contain rich, clinically relevant information, including the location, extent and etiology of any scars present. Although recently developed CLIP-based training enables pretraining models with image-text pairs, it requires large amounts of data and further finetuning strategies on downstream tasks. In this study, we use various strategies rooted in domain knowledge to train a model for LGE detection solely using text from clinical reports, on a relatively small clinical cohort of 965 patients. We improve performance through the use of synthetic data augmentation, by systematically creating scar images and associated text. In addition, we standardize the orientation of the images in an anatomy-informed way to enable better alignment of spatial and text features. We also use a captioning loss to enable fine-grained supervision and explore the effect of pretraining of the vision encoder on performance. Finally, ablation studies are carried out to elucidate the contributions of each design component to the overall performance of the model.

Paper Structure

This paper contains 27 sections, 3 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of the proposed model
  • Figure 2: Synthetic Data Generation: a) Generation pipeline, b) Examples of synthetically generated images and corresponding captions. Text may refer to a region other than the one of the paired image (image 3).
  • Figure 3: Anatomically informed normalization of LV orientation
  • Figure 4: Qualitative results on real clinical data. Model predictions for each image, with the patient-level GT text