Table of Contents
Fetching ...

Hyperparameter-Free Medical Image Synthesis for Sharing Data and Improving Site-Specific Segmentation

Alexander Chebykin, Peter A. N. Bosman, Tanja Alderliesten

TL;DR

HyFree-S3 introduces a hyperparameter-free, asynchronous distributed framework that generates and shares synthetic medical images to improve site-specific segmentation without exchanging real data. The method integrates a hyperparameter-free StyleGAN2 data generator with nnU-Net-based segmentation, training a general model on pooled synthetic data and refining it locally. Across cervical MRI, chest X-ray, and polyp datasets, HyFree-S3 nearly matches centralized real-data performance and shows robust gains over purely local training, while mitigating memorization and privacy risks through an embedding-based filtering step. This approach enables privacy-preserving, scalable data sharing and model improvement across multiple sites with minimal coordination and hyperparameter tuning.

Abstract

Sharing synthetic medical images is a promising alternative to sharing real images that can improve patient privacy and data security. To get good results, existing methods for medical image synthesis must be manually adjusted when they are applied to unseen data. To remove this manual burden, we introduce a Hyperparameter-Free distributed learning method for automatic medical image Synthesis, Sharing, and Segmentation called HyFree-S3. For three diverse segmentation settings (pelvic MRIs, lung X-rays, polyp photos), the use of HyFree-S3 results in improved performance over training only with site-specific data (in the majority of cases). The hyperparameter-free nature of the method should make data synthesis and sharing easier, potentially leading to an increase in the quantity of available data and consequently the quality of the models trained that may ultimately be applied in the clinic. Our code is available at https://github.com/AwesomeLemon/HyFree-S3

Hyperparameter-Free Medical Image Synthesis for Sharing Data and Improving Site-Specific Segmentation

TL;DR

HyFree-S3 introduces a hyperparameter-free, asynchronous distributed framework that generates and shares synthetic medical images to improve site-specific segmentation without exchanging real data. The method integrates a hyperparameter-free StyleGAN2 data generator with nnU-Net-based segmentation, training a general model on pooled synthetic data and refining it locally. Across cervical MRI, chest X-ray, and polyp datasets, HyFree-S3 nearly matches centralized real-data performance and shows robust gains over purely local training, while mitigating memorization and privacy risks through an embedding-based filtering step. This approach enables privacy-preserving, scalable data sharing and model improvement across multiple sites with minimal coordination and hyperparameter tuning.

Abstract

Sharing synthetic medical images is a promising alternative to sharing real images that can improve patient privacy and data security. To get good results, existing methods for medical image synthesis must be manually adjusted when they are applied to unseen data. To remove this manual burden, we introduce a Hyperparameter-Free distributed learning method for automatic medical image Synthesis, Sharing, and Segmentation called HyFree-S3. For three diverse segmentation settings (pelvic MRIs, lung X-rays, polyp photos), the use of HyFree-S3 results in improved performance over training only with site-specific data (in the majority of cases). The hyperparameter-free nature of the method should make data synthesis and sharing easier, potentially leading to an increase in the quantity of available data and consequently the quality of the models trained that may ultimately be applied in the clinic. Our code is available at https://github.com/AwesomeLemon/HyFree-S3
Paper Structure (25 sections, 9 figures, 9 tables)

This paper contains 25 sections, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Overview of HyFree-S3 for two sites. Synthetic datasets are generated at each site independently, merged at a central site, and used in training a general segmentation model. That model is copied to all sites and independently fine-tuned on the local data. All models automatically adapt to the properties of the data.
  • Figure 2: Results for the Cervix data: U-Nets trained with the settings specified in \ref{['exp:setup']} and evaluated on sites A and B (5 folds).
  • Figure 3: DS for the Lung (left) and Polyp (right) data: U-Nets trained in the settings specified in \ref{['exp:setup']} and evaluated on sites A and B (5 folds).
  • Figure 4: DS improvement of syn-real over real as more sites are added (Lung).
  • Figure 5: Left: real images that are the closest to any synthetic image and the two closest synthetic images (including the distance to the real image). Right: a distribution of distances to the nearest neighbor for (top) two subsets of real images or (bottom) synthetic and real images, and the 5$^{\mathrm{th}}$ percentile of the real-to-real distances.
  • ...and 4 more figures