Table of Contents
Fetching ...

Latent Motion Profiling for Annotation-free Cardiac Phase Detection in Adult and Fetal Echocardiography Videos

Yingyu Yang, Qianye Yang, Kangning Cui, Can Peng, Elena D'Alberti, Netzahualcoyotl Hernandez-Cruz, Olga Patey, Aris T. Papageorghiou, J. Alison Noble

TL;DR

This work tackles the annotation bottleneck in automatic cardiac phase detection by introducing Latent Motion Profiling (LMP), a self-supervised framework that learns interpretable latent motion trajectories from 4-chamber echocardiography videos. By decomposing frame-level latent space into a static anatomy component $z^s$ and a low-rank motion component $z^m_t = \mathbf{E}\mathbf{a}_t$ with $K=2$, the model learns two physiologically meaningful motion directions (septal and lateral wall) and encodes cardiorespiratory dynamics without ED/ES labels. ED and ES are identified as extremes in the learned trajectory by projecting motion coefficients onto a principal direction, leveraging a PCA-based extremes detection strategy. On adult EchoNet-Dynamic data, the method achieves MAE of $3.0$ frames for ED and $2.0$ frames for ES, matching supervised approaches, while fetal data demonstrates robust generalization with transfer learning, reaching $1.46$ frames (ED) and $1.74$ frames (ES) in matched-pair MAE and around 95–96% frame-level detection. The framework offers a scalable, annotation-free pathway for cardiac motion analysis across clinical populations and lays groundwork for extensions to view-invariant modelling and pathology-aware trajectory clustering, with code to be released publicly.

Abstract

The identification of cardiac phase is an essential step for analysis and diagnosis of cardiac function. Automatic methods, especially data-driven methods for cardiac phase detection, typically require extensive annotations, which is time-consuming and labor-intensive. In this paper, we present an unsupervised framework for end-diastole (ED) and end-systole (ES) detection through self-supervised learning of latent cardiac motion trajectories from 4-chamber-view echocardiography videos. Our method eliminates the need for manual annotations, including ED and ES indices, segmentation, or volumetric measurements, by training a reconstruction model to encode interpretable spatiotemporal motion patterns. Evaluated on the EchoNet-Dynamic benchmark, the approach achieves mean absolute error (MAE) of 3 frames (58.3 ms) for ED and 2 frames (38.8 ms) for ES detection, matching state-of-the-art supervised methods. Extended to fetal echocardiography, the model demonstrates robust performance with MAE 1.46 frames (20.7 ms) for ED and 1.74 frames (25.3 ms) for ES, despite the fact that the fetal heart model is built using non-standardized heart views due to fetal heart positioning variability. Our results demonstrate the potential of the proposed latent motion trajectory strategy for cardiac phase detection in adult and fetal echocardiography. This work advances unsupervised cardiac motion analysis, offering a scalable solution for clinical populations lacking annotated data. Code will be released at https://github.com/YingyuYyy/CardiacPhase.

Latent Motion Profiling for Annotation-free Cardiac Phase Detection in Adult and Fetal Echocardiography Videos

TL;DR

This work tackles the annotation bottleneck in automatic cardiac phase detection by introducing Latent Motion Profiling (LMP), a self-supervised framework that learns interpretable latent motion trajectories from 4-chamber echocardiography videos. By decomposing frame-level latent space into a static anatomy component and a low-rank motion component with , the model learns two physiologically meaningful motion directions (septal and lateral wall) and encodes cardiorespiratory dynamics without ED/ES labels. ED and ES are identified as extremes in the learned trajectory by projecting motion coefficients onto a principal direction, leveraging a PCA-based extremes detection strategy. On adult EchoNet-Dynamic data, the method achieves MAE of frames for ED and frames for ES, matching supervised approaches, while fetal data demonstrates robust generalization with transfer learning, reaching frames (ED) and frames (ES) in matched-pair MAE and around 95–96% frame-level detection. The framework offers a scalable, annotation-free pathway for cardiac motion analysis across clinical populations and lays groundwork for extensions to view-invariant modelling and pathology-aware trajectory clustering, with code to be released publicly.

Abstract

The identification of cardiac phase is an essential step for analysis and diagnosis of cardiac function. Automatic methods, especially data-driven methods for cardiac phase detection, typically require extensive annotations, which is time-consuming and labor-intensive. In this paper, we present an unsupervised framework for end-diastole (ED) and end-systole (ES) detection through self-supervised learning of latent cardiac motion trajectories from 4-chamber-view echocardiography videos. Our method eliminates the need for manual annotations, including ED and ES indices, segmentation, or volumetric measurements, by training a reconstruction model to encode interpretable spatiotemporal motion patterns. Evaluated on the EchoNet-Dynamic benchmark, the approach achieves mean absolute error (MAE) of 3 frames (58.3 ms) for ED and 2 frames (38.8 ms) for ES detection, matching state-of-the-art supervised methods. Extended to fetal echocardiography, the model demonstrates robust performance with MAE 1.46 frames (20.7 ms) for ED and 1.74 frames (25.3 ms) for ES, despite the fact that the fetal heart model is built using non-standardized heart views due to fetal heart positioning variability. Our results demonstrate the potential of the proposed latent motion trajectory strategy for cardiac phase detection in adult and fetal echocardiography. This work advances unsupervised cardiac motion analysis, offering a scalable solution for clinical populations lacking annotated data. Code will be released at https://github.com/YingyuYyy/CardiacPhase.

Paper Structure

This paper contains 13 sections, 2 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: Latent Cardiac Motion Profiling via Frame-wise Reconstruction.(a) Structure-motion decomposition: Input frame $x_t$ is encoded as $z_t = z^s + \mathbf{E}\bm{a}_t$, where $z^s \in \mathbb{R}^D$ represents static anatomy and $\mathbf{E}\bm{a}_t$ models motion in orthogonal subspace $\mathbf{E} = [\bm{e}_1, \dots, \bm{e}_K]$. (b) Motion trajectory: Temporal evolution of $\{\bm{a}_t\}$ in $\mathbb{R}^K$ ($K = 2$ in our case) reveals ED/ES phases as geometric landmarks, enabling unsupervised detection through trajectory analysis.
  • Figure 2: Latent motion disentanglement visualization in the motion subspace spanned by $\mathbf{E} = [\bm{e}_1, \bm{e}_2]$. (a) Three latent motion trajectories: two axes-specific motions (orange and blue lines) and their combination (red line). (b) Reconstruction of the red trajectory showing cardiac contraction with left ventricle volume decreasing and the movement of the septum and lateral wall (red arrows). (c) Reconstruction difference between red and blue trajectories, with minimal differences in the dashed region indicating e1 axis correlation with septal movement (blue arrow). (d) Reconstruction difference between red and orange trajectories, with minimal differences in the dashed region indicating e2 axis correlation with lateral movement (orange arrow).
  • Figure 3: Latent motion trajectory for ED/ES indices detection. (a-c) A fetal test example. (a) Align fetal 4CH view to canonical apical orientation. (b) 2D latent motion trajectory. (c) Following Algorithm \ref{['alg:pca_extremes']}, the motion trajectory is projected to the principal direction. Peaks of the projected trajectory indicate ES and valleys indicate ED. Predicted ED/ES image frames are shown. (d-e) An adult test example.