BOTM: Echocardiography Segmentation via Bi-directional Optimal Token Matching
Zhihua Liu, Lei Tong, Xilin He, Che Liu, Rossella Arcucci, Chen Jin, Huiyu Zhou
TL;DR
This work tackles echocardiography segmentation under challenging noise and anatomy-variation by enforcing cross-frame anatomical consistency through token-level optimal transport. It introduces BOTM, which pairs a shared vision-transformer encoder with an OT-based token matching map $ \mathbf{T}^{\star} $ and a Bi-directional Cross-Transport Attention proxy that leverages forward and backward transport to refine token embeddings. Empirically, BOTM delivers state-of-the-art or competitive results on CAMUS and TED datasets, with notable reductions in mean Hausdorff Distance and improvements in Dice, while showing robustness to artifacts and data limitations. By avoiding heavy ad-hoc adapters and operating on patch-level tokens, BOTM provides better anatomical coherence and interpretability for temporal echocardiography segmentation.
Abstract
Existed echocardiography segmentation methods often suffer from anatomical inconsistency challenge caused by shape variation, partial observation and region ambiguity with similar intensity across 2D echocardiographic sequences, resulting in false positive segmentation with anatomical defeated structures in challenging low signal-to-noise ratio conditions. To provide a strong anatomical guarantee across different echocardiographic frames, we propose a novel segmentation framework named BOTM (Bi-directional Optimal Token Matching) that performs echocardiography segmentation and optimal anatomy transportation simultaneously. Given paired echocardiographic images, BOTM learns to match two sets of discrete image tokens by finding optimal correspondences from a novel anatomical transportation perspective. We further extend the token matching into a bi-directional cross-transport attention proxy to regulate the preserved anatomical consistency within the cardiac cyclic deformation in temporal domain. Extensive experimental results show that BOTM can generate stable and accurate segmentation outcomes (e.g. -1.917 HD on CAMUS2H LV, +1.9% Dice on TED), and provide a better matching interpretation with anatomical consistency guarantee.
