EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data Sharing
Hadrien Reynaud, Qingjie Meng, Mischa Dombrowski, Arijit Ghosh, Thomas Day, Alberto Gomez, Paul Leeson, Bernhard Kainz
TL;DR
The paper tackles the challenge of sharing medical video data while protecting patient privacy, proposing a fully synthetic, de-identified echocardiogram generation workflow. It builds a diffusion-based pipeline operating in a latent space, incorporating a privacy filter and a long-video stitching mechanism to produce high-fidelity, temporally coherent samples. EchoNet-Synthetic is released as a fully synthetic echocardiography dataset with paired ejection fraction labels and is validated via downstream regression performance on real test data. Results show comparable dataset fidelity to real data, substantial gains in generation speed over baselines, and effective privacy control, highlighting practical potential for safe medical data sharing and research replication.
Abstract
To make medical datasets accessible without sharing sensitive patient information, we introduce a novel end-to-end approach for generative de-identification of dynamic medical imaging data. Until now, generative methods have faced constraints in terms of fidelity, spatio-temporal coherence, and the length of generation, failing to capture the complete details of dataset distributions. We present a model designed to produce high-fidelity, long and complete data samples with near-real-time efficiency and explore our approach on a challenging task: generating echocardiogram videos. We develop our generation method based on diffusion models and introduce a protocol for medical video dataset anonymization. As an exemplar, we present EchoNet-Synthetic, a fully synthetic, privacy-compliant echocardiogram dataset with paired ejection fraction labels. As part of our de-identification protocol, we evaluate the quality of the generated dataset and propose to use clinical downstream tasks as a measurement on top of widely used but potentially biased image quality metrics. Experimental outcomes demonstrate that EchoNet-Synthetic achieves comparable dataset fidelity to the actual dataset, effectively supporting the ejection fraction regression task. Code, weights and dataset are available at https://github.com/HReynaud/EchoNet-Synthetic.
