Synthetic Data in Radiological Imaging: Current State and Future Outlook
Elena Sizikova, Andreu Badal, Jana G. Delfino, Miguel Lago, Brandon Nelson, Niloufar Saharkhiz, Berkman Sahiner, Ghada Zamzmi, Aldo Badano
TL;DR
The paper surveys the role of synthetic data in radiological imaging to overcome data availability and privacy constraints for AI. It categorizes generation techniques into statistical, physical, and hybrid approaches, and discusses disease modeling and evaluation metrics. It highlights real-world applications including algorithm development, testing, in silico trials, and privacy-preserving data sharing, with exemplars like VICTRE and various synthetic datasets. It also addresses limitations, challenges in validation, and regulatory considerations, arguing that continued advances are needed to close the realism and governance gaps.
Abstract
A key challenge for the development and deployment of artificial intelligence (AI) solutions in radiology is solving the associated data limitations. Obtaining sufficient and representative patient datasets with appropriate annotations may be burdensome due to high acquisition cost, safety limitations, patient privacy restrictions or low disease prevalence rates. In silico data offers a number of potential advantages to patient data, such as diminished patient harm, reduced cost, simplified data acquisition, scalability, improved quality assurance testing, and a mitigation approach to data imbalances. We summarize key research trends and practical uses for synthetically generated data for radiological applications of AI. Specifically, we discuss different types of techniques for generating synthetic examples, their main application areas, and related quality control assessment issues. We also discuss current approaches for evaluating synthetic imaging data. Overall, synthetic data holds great promise in addressing current data availability gaps, but additional work is needed before its full potential is realized.
