Table of Contents
Fetching ...

Position Paper: Building Trust in Synthetic Data for Clinical AI

Krishan Agyakari Raja Babu, Supriti Mulay, Om Prabhu, Mohanasankar Sivaprakasam

TL;DR

The paper investigates the trust barrier for synthetic medical data in clinical AI, focusing on brain tumor segmentation as a testbed. It uses BraTS 2021 real data and diffusion-based synthetic copies conditioned on semantic maps to train a SwinUNETRc4-based 3D segmentation model with varying real:synthetic data mixes, assessing performance with Dice across five trust factors. The findings show that a balanced mix of real and synthetic data ($\alpha \approx 0.5$) yields similar real and synthetic performance for overall metrics and larger ROIs, while smaller regions remain challenging, highlighting limitations in synthetic data’s ability to fully mirror real-world variability. The work provides a practical framework linking data quality, diversity, and proportion to trust, offering guidance for responsibly integrating synthetic data into clinical workflows and emphasizing transparency and explainability in generation processes.

Abstract

Deep generative models and synthetic medical data have shown significant promise in addressing key challenges in healthcare, such as privacy concerns, data bias, and the scarcity of realistic datasets. While research in this area has grown rapidly and demonstrated substantial theoretical potential, its practical adoption in clinical settings remains limited. Despite the benefits synthetic data offers, questions surrounding its reliability and credibility persist, leading to a lack of trust among clinicians. This position paper argues that fostering trust in synthetic medical data is crucial for its clinical adoption. It aims to spark a discussion on the viability of synthetic medical data in clinical practice, particularly in the context of current advancements in AI. We present empirical evidence from brain tumor segmentation to demonstrate that the quality, diversity, and proportion of synthetic data directly impact trust in clinical AI models. Our findings provide insights to improve the deployment and acceptance of synthetic data-driven AI systems in real-world clinical workflows.

Position Paper: Building Trust in Synthetic Data for Clinical AI

TL;DR

The paper investigates the trust barrier for synthetic medical data in clinical AI, focusing on brain tumor segmentation as a testbed. It uses BraTS 2021 real data and diffusion-based synthetic copies conditioned on semantic maps to train a SwinUNETRc4-based 3D segmentation model with varying real:synthetic data mixes, assessing performance with Dice across five trust factors. The findings show that a balanced mix of real and synthetic data () yields similar real and synthetic performance for overall metrics and larger ROIs, while smaller regions remain challenging, highlighting limitations in synthetic data’s ability to fully mirror real-world variability. The work provides a practical framework linking data quality, diversity, and proportion to trust, offering guidance for responsibly integrating synthetic data into clinical workflows and emphasizing transparency and explainability in generation processes.

Abstract

Deep generative models and synthetic medical data have shown significant promise in addressing key challenges in healthcare, such as privacy concerns, data bias, and the scarcity of realistic datasets. While research in this area has grown rapidly and demonstrated substantial theoretical potential, its practical adoption in clinical settings remains limited. Despite the benefits synthetic data offers, questions surrounding its reliability and credibility persist, leading to a lack of trust among clinicians. This position paper argues that fostering trust in synthetic medical data is crucial for its clinical adoption. It aims to spark a discussion on the viability of synthetic medical data in clinical practice, particularly in the context of current advancements in AI. We present empirical evidence from brain tumor segmentation to demonstrate that the quality, diversity, and proportion of synthetic data directly impact trust in clinical AI models. Our findings provide insights to improve the deployment and acceptance of synthetic data-driven AI systems in real-world clinical workflows.

Paper Structure

This paper contains 21 sections, 10 equations, 8 figures.

Figures (8)

  • Figure 1: Illustration of the Proposed Method
  • Figure 2: Comparison of Real (R) and Synthetic (S) Brain Tumor Images.
  • Figure 3: Volumetric Distribution of Tumor Regions
  • Figure 4: Visualization of Trust Factor $T_1$
  • Figure 5: Visualization of Trust Factor $T_2$
  • ...and 3 more figures