Table of Contents
Fetching ...

Synthetic Data in Radiological Imaging: Current State and Future Outlook

Elena Sizikova, Andreu Badal, Jana G. Delfino, Miguel Lago, Brandon Nelson, Niloufar Saharkhiz, Berkman Sahiner, Ghada Zamzmi, Aldo Badano

TL;DR

The paper surveys the role of synthetic data in radiological imaging to overcome data availability and privacy constraints for AI. It categorizes generation techniques into statistical, physical, and hybrid approaches, and discusses disease modeling and evaluation metrics. It highlights real-world applications including algorithm development, testing, in silico trials, and privacy-preserving data sharing, with exemplars like VICTRE and various synthetic datasets. It also addresses limitations, challenges in validation, and regulatory considerations, arguing that continued advances are needed to close the realism and governance gaps.

Abstract

A key challenge for the development and deployment of artificial intelligence (AI) solutions in radiology is solving the associated data limitations. Obtaining sufficient and representative patient datasets with appropriate annotations may be burdensome due to high acquisition cost, safety limitations, patient privacy restrictions or low disease prevalence rates. In silico data offers a number of potential advantages to patient data, such as diminished patient harm, reduced cost, simplified data acquisition, scalability, improved quality assurance testing, and a mitigation approach to data imbalances. We summarize key research trends and practical uses for synthetically generated data for radiological applications of AI. Specifically, we discuss different types of techniques for generating synthetic examples, their main application areas, and related quality control assessment issues. We also discuss current approaches for evaluating synthetic imaging data. Overall, synthetic data holds great promise in addressing current data availability gaps, but additional work is needed before its full potential is realized.

Synthetic Data in Radiological Imaging: Current State and Future Outlook

TL;DR

The paper surveys the role of synthetic data in radiological imaging to overcome data availability and privacy constraints for AI. It categorizes generation techniques into statistical, physical, and hybrid approaches, and discusses disease modeling and evaluation metrics. It highlights real-world applications including algorithm development, testing, in silico trials, and privacy-preserving data sharing, with exemplars like VICTRE and various synthetic datasets. It also addresses limitations, challenges in validation, and regulatory considerations, arguing that continued advances are needed to close the realism and governance gaps.

Abstract

A key challenge for the development and deployment of artificial intelligence (AI) solutions in radiology is solving the associated data limitations. Obtaining sufficient and representative patient datasets with appropriate annotations may be burdensome due to high acquisition cost, safety limitations, patient privacy restrictions or low disease prevalence rates. In silico data offers a number of potential advantages to patient data, such as diminished patient harm, reduced cost, simplified data acquisition, scalability, improved quality assurance testing, and a mitigation approach to data imbalances. We summarize key research trends and practical uses for synthetically generated data for radiological applications of AI. Specifically, we discuss different types of techniques for generating synthetic examples, their main application areas, and related quality control assessment issues. We also discuss current approaches for evaluating synthetic imaging data. Overall, synthetic data holds great promise in addressing current data availability gaps, but additional work is needed before its full potential is realized.
Paper Structure (22 sections, 1 figure, 2 tables)

This paper contains 22 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Properties of the digital object and acquisition system models can be controlled during synthetic data generation process. Shown is the variation in imaging dose (number of Monte Carlo histories) generated with the VICTRE pipeline for digital mammography simulation Badano2018victre for a digital breast model graffNewOpensourceMultimodality2016 with fatty breast density and mass model de2015computational with 5 mm radius (adapted from sizikova2023knowledge).