Table of Contents
Fetching ...

Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation

Junxuan Yu, Rusi Chen, Yongsong Zhou, Yanlin Chen, Yaofei Duan, Yuhao Huang, Han Zhou, Tan Tao, Xin Yang, Dong Ni

TL;DR

The paper addresses the challenge of generating high-fidelity, motion-consistent echocardiography videos from a single initial frame by introducing motion-curve guided diffusion (ECM). ECM extracts per-structure motion curves $f_c^m$, aligns them with semantic structure features via a Structure-to-Motion module, and injects positional information through Gaussian-mask based attention, enabling controllable edits by scaling or replacing motion curves. The approach demonstrates state-of-the-art fidelity and consistency on three datasets, with ablations showing clear gains from motion curves, the alignment module, and the position-aware attention. This method has practical impact for clinical teaching and ML training in settings with limited ultrasound video data, offering interpretable and adjustable video synthesis capabilities.

Abstract

Echocardiography video is a primary modality for diagnosing heart diseases, but the limited data poses challenges for both clinical teaching and machine learning training. Recently, video generative models have emerged as a promising strategy to alleviate this issue. However, previous methods often relied on holistic conditions during generation, hindering the flexible movement control over specific cardiac structures. In this context, we propose an explainable and controllable method for echocardiography video generation, taking an initial frame and a motion curve as guidance. Our contributions are three-fold. First, we extract motion information from each heart substructure to construct motion curves, enabling the diffusion model to synthesize customized echocardiography videos by modifying these curves. Second, we propose the structure-to-motion alignment module, which can map semantic features onto motion curves across cardiac structures. Third, The position-aware attention mechanism is designed to enhance video consistency utilizing Gaussian masks with structural position information. Extensive experiments on three echocardiography datasets show that our method outperforms others regarding fidelity and consistency. The full code will be released at https://github.com/mlmi-2024-72/ECM.

Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation

TL;DR

The paper addresses the challenge of generating high-fidelity, motion-consistent echocardiography videos from a single initial frame by introducing motion-curve guided diffusion (ECM). ECM extracts per-structure motion curves , aligns them with semantic structure features via a Structure-to-Motion module, and injects positional information through Gaussian-mask based attention, enabling controllable edits by scaling or replacing motion curves. The approach demonstrates state-of-the-art fidelity and consistency on three datasets, with ablations showing clear gains from motion curves, the alignment module, and the position-aware attention. This method has practical impact for clinical teaching and ML training in settings with limited ultrasound video data, offering interpretable and adjustable video synthesis capabilities.

Abstract

Echocardiography video is a primary modality for diagnosing heart diseases, but the limited data poses challenges for both clinical teaching and machine learning training. Recently, video generative models have emerged as a promising strategy to alleviate this issue. However, previous methods often relied on holistic conditions during generation, hindering the flexible movement control over specific cardiac structures. In this context, we propose an explainable and controllable method for echocardiography video generation, taking an initial frame and a motion curve as guidance. Our contributions are three-fold. First, we extract motion information from each heart substructure to construct motion curves, enabling the diffusion model to synthesize customized echocardiography videos by modifying these curves. Second, we propose the structure-to-motion alignment module, which can map semantic features onto motion curves across cardiac structures. Third, The position-aware attention mechanism is designed to enhance video consistency utilizing Gaussian masks with structural position information. Extensive experiments on three echocardiography datasets show that our method outperforms others regarding fidelity and consistency. The full code will be released at https://github.com/mlmi-2024-72/ECM.
Paper Structure (9 sections, 4 equations, 4 figures, 2 tables)

This paper contains 9 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Workflow of ECM. Input: an initial frame and motion curves of each cardiac structure. Output: a generated echocardiography video.
  • Figure 2: The overall pipeline of the proposed ECM.
  • Figure 3: Illustration of structure-to-motion alignment module.
  • Figure 4: Visualization results of generated videos in two datasets.