Federated Data Model
Xiao Chen, Shunan Zhang, Eric Z. Chen, Yikang Liu, Lin Zhao, Terrence Chen, Shanhui Sun
TL;DR
Problem addressed: robust medical AI is hindered by domain shift and data privacy constraints that limit sharing across institutions. The main approach: Federated Data Model uses diffusion-based learning of local data distributions to generate synthetic data that can be shared across sites for training downstream models without exposing raw data. Key findings: remote augmentation with synthetic data improves cross-site generalization while local performance remains stable or slightly improved due to data augmentation. Practical impact: enables privacy-respecting, scalable, multi-site AI deployment in regulated environments, particularly for medical imaging.
Abstract
In artificial intelligence (AI), especially deep learning, data diversity and volume play a pivotal role in model development. However, training a robust deep learning model often faces challenges due to data privacy, regulations, and the difficulty of sharing data between different locations, especially for medical applications. To address this, we developed a method called the Federated Data Model (FDM). This method uses diffusion models to learn the characteristics of data at one site and then creates synthetic data that can be used at another site without sharing the actual data. We tested this approach with a medical image segmentation task, focusing on cardiac magnetic resonance images from different hospitals. Our results show that models trained with this method perform well both on the data they were originally trained on and on data from other sites. This approach offers a promising way to train accurate and privacy-respecting AI models across different locations.
