Table of Contents
Fetching ...

EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation

Hadrien Reynaud, Alberto Gomez, Paul Leeson, Qingjie Meng, Bernhard Kainz

TL;DR

EchoFlow introduces a privacy-preserving pipeline for cardiac ultrasound synthesis by learning a domain-specific latent space via an adversarial variational auto-encoder, then generating both images and videos through latent flow matching. A latent re-identification module screens anatomies to prevent leakage of real patient data, while downstream EF regression demonstrates that models trained exclusively on EchoFlow synthetic data can match real-data performance. The framework is validated across multiple public echocardiogram datasets, showing that scaling model size and training time closes the gap between synthetic and real data in clinical tasks. By releasing both models and synthetic datasets, EchoFlow provides a foundation for privacy-compliant research in medical ultrasound and sets a path for broader synthetic-data utility in healthcare AI.

Abstract

Advances in deep learning have significantly enhanced medical image analysis, yet the availability of large-scale medical datasets remains constrained by patient privacy concerns. We present EchoFlow, a novel framework designed to generate high-quality, privacy-preserving synthetic echocardiogram images and videos. EchoFlow comprises four key components: an adversarial variational autoencoder for defining an efficient latent representation of cardiac ultrasound images, a latent image flow matching model for generating accurate latent echocardiogram images, a latent re-identification model to ensure privacy by filtering images anatomically, and a latent video flow matching model for animating latent images into realistic echocardiogram videos conditioned on ejection fraction. We rigorously evaluate our synthetic datasets on the clinically relevant task of ejection fraction regression and demonstrate, for the first time, that downstream models trained exclusively on EchoFlow-generated synthetic datasets achieve performance parity with models trained on real datasets. We release our models and synthetic datasets, enabling broader, privacy-compliant research in medical ultrasound imaging at https://huggingface.co/spaces/HReynaud/EchoFlow.

EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation

TL;DR

EchoFlow introduces a privacy-preserving pipeline for cardiac ultrasound synthesis by learning a domain-specific latent space via an adversarial variational auto-encoder, then generating both images and videos through latent flow matching. A latent re-identification module screens anatomies to prevent leakage of real patient data, while downstream EF regression demonstrates that models trained exclusively on EchoFlow synthetic data can match real-data performance. The framework is validated across multiple public echocardiogram datasets, showing that scaling model size and training time closes the gap between synthetic and real data in clinical tasks. By releasing both models and synthetic datasets, EchoFlow provides a foundation for privacy-compliant research in medical ultrasound and sets a path for broader synthetic-data utility in healthcare AI.

Abstract

Advances in deep learning have significantly enhanced medical image analysis, yet the availability of large-scale medical datasets remains constrained by patient privacy concerns. We present EchoFlow, a novel framework designed to generate high-quality, privacy-preserving synthetic echocardiogram images and videos. EchoFlow comprises four key components: an adversarial variational autoencoder for defining an efficient latent representation of cardiac ultrasound images, a latent image flow matching model for generating accurate latent echocardiogram images, a latent re-identification model to ensure privacy by filtering images anatomically, and a latent video flow matching model for animating latent images into realistic echocardiogram videos conditioned on ejection fraction. We rigorously evaluate our synthetic datasets on the clinically relevant task of ejection fraction regression and demonstrate, for the first time, that downstream models trained exclusively on EchoFlow-generated synthetic datasets achieve performance parity with models trained on real datasets. We release our models and synthetic datasets, enabling broader, privacy-compliant research in medical ultrasound imaging at https://huggingface.co/spaces/HReynaud/EchoFlow.

Paper Structure

This paper contains 43 sections, 16 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Our EchoFlow framework. From left to right: The image generation model (LIFM), the privacy filter (Re-Identification), the video generation model (LVFM), the decoding stage (A-VAE) and our downstream evaluation. For each step, inputs are shown at the top, process in the middle and output at the bottom.
  • Figure 2: Qualitative comparison of recent echocardiogram synthesis methods in spatial and temporal domain.