Table of Contents
Fetching ...

OSA: Echocardiography Video Segmentation via Orthogonalized State Update and Anatomical Prior-aware Feature Enhancement

Rui Wang, Huisi Wu, Jing Qin

Abstract

Accurate and temporally consistent segmentation of the left ventricle from echocardiography videos is essential for estimating the ejection fraction and assessing cardiac function. However, modeling spatiotemporal dynamics remains difficult due to severe speckle noise and rapid non-rigid deformations. Existing linear recurrent models offer efficient in-context associative recall for temporal tracking, but rely on unconstrained state updates, which cause progressive singular value decay in the state matrix, a phenomenon known as rank collapse, resulting in anatomical details being overwhelmed by noise. To address this, we propose OSA, a framework that constrains the state evolution on the Stiefel manifold. We introduce the Orthogonalized State Update (OSU) mechanism, which formulates the memory evolution as Euclidean projected gradient descent on the Stiefel manifold to prevent rank collapse and maintain stable temporal transitions. Furthermore, an Anatomical Prior-aware Feature Enhancement module explicitly separates anatomical structures from speckle noise through a physics-driven process, providing the temporal tracker with noise-resilient structural cues. Comprehensive experiments on the CAMUS and EchoNet-Dynamic datasets show that OSA achieves state-of-the-art segmentation accuracy and temporal stability, while maintaining real-time inference efficiency for clinical deployment. Codes are available at https://github.com/wangrui2025/OSA.

OSA: Echocardiography Video Segmentation via Orthogonalized State Update and Anatomical Prior-aware Feature Enhancement

Abstract

Accurate and temporally consistent segmentation of the left ventricle from echocardiography videos is essential for estimating the ejection fraction and assessing cardiac function. However, modeling spatiotemporal dynamics remains difficult due to severe speckle noise and rapid non-rigid deformations. Existing linear recurrent models offer efficient in-context associative recall for temporal tracking, but rely on unconstrained state updates, which cause progressive singular value decay in the state matrix, a phenomenon known as rank collapse, resulting in anatomical details being overwhelmed by noise. To address this, we propose OSA, a framework that constrains the state evolution on the Stiefel manifold. We introduce the Orthogonalized State Update (OSU) mechanism, which formulates the memory evolution as Euclidean projected gradient descent on the Stiefel manifold to prevent rank collapse and maintain stable temporal transitions. Furthermore, an Anatomical Prior-aware Feature Enhancement module explicitly separates anatomical structures from speckle noise through a physics-driven process, providing the temporal tracker with noise-resilient structural cues. Comprehensive experiments on the CAMUS and EchoNet-Dynamic datasets show that OSA achieves state-of-the-art segmentation accuracy and temporal stability, while maintaining real-time inference efficiency for clinical deployment. Codes are available at https://github.com/wangrui2025/OSA.

Paper Structure

This paper contains 19 sections, 17 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Illustration of different spatiotemporal memory update paradigms for echocardiography video segmentation. Top: Memory bank methods rely on sparse key-frame retrieval to capture temporal dynamics. Middle: Linear recurrent models perform unconstrained element-wise updates for state propagation. Bottom: Our OSA enforces a Stiefel manifold constraint, yielding stable and drift-free tracking across the cardiac cycle.
  • Figure 2: Challenges in echocardiography video segmentation: (a–b) red boxes of speckle noise and blue boxes indicate indistinct or blurred contours; (c–f) large shape and scale variations across the cardiac cycle.
  • Figure 3: Overview of the OSA architecture. The Anatomical Prior-aware Feature Enhancement (APFE) encoder extracts contrast-decomposed features from echocardiography video frames, which are recurrently updated through the Orthogonalized State Update (OSU) mechanism to maintain stable and geometry-aware spatiotemporal representations. The decoder reconstructs segmentation maps, and the entire model is optimized with loss.
  • Figure 4: Illustration of OSU. Standard Euclidean recurrences suffer from rank collapse over long sequences (manifested as singular value decay and increasing condition number). OSU constrains state evolution on the Stiefel manifold via orthogonalized update. The unconstrained intermediate state $\mathbf{S}_t^{\text{Euc}}$ is projected back to the manifold as $\mathbf{S}_t$, ensuring stable temporal transitions.
  • Figure 5: Behavioral comparison of optimization vector fields. Unconstrained Euclidean updates cause the state to drift away from orthogonal constraints (climbing valley walls). OSU generates a corrective flow that pulls iterates back onto the Stiefel manifold (valley floor).
  • ...and 6 more figures