Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train
Haojun Jiang, Meng Li, Zhenguo Sun, Ning Jia, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang
TL;DR
This work tackles the difficulty of acquiring high-quality echocardiograms by creating a structure-aware cardiac world model trained with large-scale self-supervised pre-training. It introduces a 2D-3D joint pre-training framework based on I-JEPA, leveraging 1.36 million image–pose pairs from 364 routine scans to learn both 2D anatomical relationships and 3D spatial plane relationships. The downstream evaluation on probe guidance across ten standard views shows consistent MAE reductions, with up to 4.34% improvement, and ablations demonstrate that combining 2D and 3D objectives yields the best performance. The approach promises to assist novice sonographers and improve scanning efficiency, with potential for broader clinical deployment and validation on additional tasks.
Abstract
The complex structure of the heart leads to significant challenges in echocardiography, especially in acquisition cardiac ultrasound images. Successful echocardiography requires a thorough understanding of the structures on the two-dimensional plane and the spatial relationships between planes in three-dimensional space. In this paper, we innovatively propose a large-scale self-supervised pre-training method to acquire a cardiac structure-aware world model. The core innovation lies in constructing a self-supervised task that requires structural inference by predicting masked structures on a 2D plane and imagining another plane based on pose transformation in 3D space. To support large-scale pre-training, we collected over 1.36 million echocardiograms from ten standard views, along with their 3D spatial poses. In the downstream probe guidance task, we demonstrate that our pre-trained model consistently reduces guidance errors across the ten most common standard views on the test set with 0.29 million samples from 74 routine clinical scans, indicating that structure-aware pre-training benefits the scanning.
