Table of Contents
Fetching ...

Generative Regression for Left Ventricular Ejection Fraction Estimation from Echocardiography Video

Jinrong Lv, Xun Gong, Zhaohuan Li, Weili Jiang

TL;DR

This paper reframes Left Ventricular Ejection Fraction estimation from echocardiography as a probabilistic inverse problem by modeling the full posterior $p(y\mid \mathbf{x}, \mathbf{a})$ with a generative diffusion framework. The authors introduce MCSDR, a Multimodal Conditional Score Network that fuses spatiotemporal echocardiogram features with demographic priors to guide a reverse diffusion process, enabling generation of multiple plausible LVEF trajectories. This approach yields state-of-the-art performance on EchoNet-Dynamic, EchoNet-Pediatric, and CAMUS, and provides reliability signals via posterior dispersion and generative trajectories that enhance interpretability. The work demonstrates how multimodal priors stabilize ill-posed inference, offering practical, uncertainty-aware support for AI-assisted cardiology diagnostics.

Abstract

Estimating Left Ventricular Ejection Fraction (LVEF) from echocardiograms constitutes an ill-posed inverse problem. Inherent noise, artifacts, and limited viewing angles introduce ambiguity, where a single video sequence may map not to a unique ground truth, but rather to a distribution of plausible physiological values. Prevailing deep learning approaches typically formulate this task as a standard regression problem that minimizes the Mean Squared Error (MSE). However, this paradigm compels the model to learn the conditional expectation, which may yield misleading predictions when the underlying posterior distribution is multimodal or heavy-tailed -- a common phenomenon in pathological scenarios. In this paper, we investigate the paradigm shift from deterministic regression toward generative regression. We propose the Multimodal Conditional Score-based Diffusion model for Regression (MCSDR), a probabilistic framework designed to model the continuous posterior distribution of LVEF conditioned on echocardiogram videos and patient demographic attribute priors. Extensive experiments conducted on the EchoNet-Dynamic, EchoNet-Pediatric, and CAMUS datasets demonstrate that MCSDR achieves state-of-the-art performance. Notably, qualitative analysis reveals that the generation trajectories of our model exhibit distinct behaviors in cases characterized by high noise or significant physiological variability, thereby offering a novel layer of interpretability for AI-aided diagnosis.

Generative Regression for Left Ventricular Ejection Fraction Estimation from Echocardiography Video

TL;DR

This paper reframes Left Ventricular Ejection Fraction estimation from echocardiography as a probabilistic inverse problem by modeling the full posterior with a generative diffusion framework. The authors introduce MCSDR, a Multimodal Conditional Score Network that fuses spatiotemporal echocardiogram features with demographic priors to guide a reverse diffusion process, enabling generation of multiple plausible LVEF trajectories. This approach yields state-of-the-art performance on EchoNet-Dynamic, EchoNet-Pediatric, and CAMUS, and provides reliability signals via posterior dispersion and generative trajectories that enhance interpretability. The work demonstrates how multimodal priors stabilize ill-posed inference, offering practical, uncertainty-aware support for AI-assisted cardiology diagnostics.

Abstract

Estimating Left Ventricular Ejection Fraction (LVEF) from echocardiograms constitutes an ill-posed inverse problem. Inherent noise, artifacts, and limited viewing angles introduce ambiguity, where a single video sequence may map not to a unique ground truth, but rather to a distribution of plausible physiological values. Prevailing deep learning approaches typically formulate this task as a standard regression problem that minimizes the Mean Squared Error (MSE). However, this paradigm compels the model to learn the conditional expectation, which may yield misleading predictions when the underlying posterior distribution is multimodal or heavy-tailed -- a common phenomenon in pathological scenarios. In this paper, we investigate the paradigm shift from deterministic regression toward generative regression. We propose the Multimodal Conditional Score-based Diffusion model for Regression (MCSDR), a probabilistic framework designed to model the continuous posterior distribution of LVEF conditioned on echocardiogram videos and patient demographic attribute priors. Extensive experiments conducted on the EchoNet-Dynamic, EchoNet-Pediatric, and CAMUS datasets demonstrate that MCSDR achieves state-of-the-art performance. Notably, qualitative analysis reveals that the generation trajectories of our model exhibit distinct behaviors in cases characterized by high noise or significant physiological variability, thereby offering a novel layer of interpretability for AI-aided diagnosis.
Paper Structure (32 sections, 7 equations, 10 figures, 5 tables)

This paper contains 32 sections, 7 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Illustration of the proposed LVEF probabilistic inference process. The model reverses the diffusion process (blue), transforming random noise (orange) into plausible LVEF samples. Crucially, this trajectory is guided by the fusion of echocardiogram videos and patient demographic attributes (e.g., age, sex). By sampling multiple trajectories, we approximate the full posterior distribution (green). A narrow distribution indicates high confidence supported by unambiguous evidence, and vice versa.
  • Figure 2: Paradigm shift from deterministic to generative regression. (a) Conventional methods minimize MSE loss to predict a single point estimate (conditional mean), often failing to capture ambiguity. (b) Our proposed MCSDR framework treats LVEF estimation as a probabilistic inverse problem. It fuses the echocardiogram video with patient attributes to learn a conditional score function. This function guides a reverse diffusion process to generate the full posterior distribution $p(y|\mathbf{x}, \mathbf{c})$.
  • Figure 3: The stochastic generative trajectory. Top: The forward diffusion process gradually perturbs the scalar LVEF label $y_0$ into Gaussian noise $y_T$. Bottom: The reverse generative process, governed by the learned conditional score function, iteratively solves the inverse problem to reconstruct $y_0$ from noise, conditioned on clinical evidence.
  • Figure 4: Architecture of the Multimodal Conditional Score Network (MCSN). The network estimates the gradient field necessary for the generative process. To effectively fuse heterogeneous data, we employ dual encoders: a Video Encoder for spatiotemporal echocardiogram features and an Attribute Encoder for tabular clinical priors (e.g., age, weight). These features are concatenated and projected to form the Keys ($\mathbf{K}$) and Values ($\mathbf{V}$) in the cross-attention mechanism. This design allows the model to dynamically condition the denoising of the LVEF state (Query $\mathbf{Q}$) on specific patient contexts.
  • Figure 5: Visual analysis of model predictions on (a) CAMUS, (b) EchoNet-Pediatric, and (c) EchoNet-Dynamic test sets. Left: Scatter plots of Predicted vs. True LVEF. Right: Density plots comparing the distribution of predictions (Blue) against the ground truth (Red).
  • ...and 5 more figures