UltraSeP: Sequence-aware Pre-training for Echocardiography Probe Movement Guidance
Haojun Jiang, Teng Wang, Zhenguo Sun, Yulin Wang, Yang Yue, Yu Sun, Ning Jia, Meng Li, Shaqi Luo, Shiji Song, Gao Huang
TL;DR
UltraSeP introduces sequence-aware pre-training to address individual variability in echocardiography by learning personalized cardiac structures from scanning sequences. The method uses a vision transformer and an action encoder to predict masked image features and probe movements within a trajectory, with segmental sampling and EMA-based targets to enhance learning. Downstream, a shared sequence transformer with ten plane-specific heads enables accurate probe guidance toward ten standard planes, outperforming diverse baselines with statistically significant improvements. The approach demonstrates strong generalization, robustness to cardiac-cycle variation, and real-time inference potential, supporting AI-assisted or robotic echocardiography in clinical settings.
Abstract
Echocardiography is an essential medical technique for diagnosing cardiovascular diseases, but its high operational complexity has led to a shortage of trained professionals. To address this issue, we introduce a novel probe movement guidance algorithm that has the potential to be applied in guiding robotic systems or novices with probe pose adjustment for high-quality standard plane image acquisition.Cardiac ultrasound faces two major challenges: (1) the inherently complex structure of the heart, and (2) significant individual variations. Previous works have only learned the population-averaged structure of the heart rather than personalized cardiac structures, leading to a performance bottleneck. Clinically, we observe that sonographers dynamically adjust their interpretation of a patient's cardiac anatomy based on prior scanning sequences, consequently refining their scanning strategies. Inspired by this, we propose a novel sequence-aware self-supervised pre-training method. Specifically, our approach learns personalized three-dimensional cardiac structural features by predicting the masked-out image features and probe movement actions in a scanning sequence. We hypothesize that if the model can predict the missing content it has acquired a good understanding of personalized cardiac structure. Extensive experiments on a large-scale expert scanning dataset with 1.67 million samples demonstrate that our proposed sequence-aware paradigm can effectively reduce probe guidance errors compared to other advanced baseline methods.
