Table of Contents
Fetching ...

Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train

Haojun Jiang, Meng Li, Zhenguo Sun, Ning Jia, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang

TL;DR

This work tackles the difficulty of acquiring high-quality echocardiograms by creating a structure-aware cardiac world model trained with large-scale self-supervised pre-training. It introduces a 2D-3D joint pre-training framework based on I-JEPA, leveraging 1.36 million image–pose pairs from 364 routine scans to learn both 2D anatomical relationships and 3D spatial plane relationships. The downstream evaluation on probe guidance across ten standard views shows consistent MAE reductions, with up to 4.34% improvement, and ablations demonstrate that combining 2D and 3D objectives yields the best performance. The approach promises to assist novice sonographers and improve scanning efficiency, with potential for broader clinical deployment and validation on additional tasks.

Abstract

The complex structure of the heart leads to significant challenges in echocardiography, especially in acquisition cardiac ultrasound images. Successful echocardiography requires a thorough understanding of the structures on the two-dimensional plane and the spatial relationships between planes in three-dimensional space. In this paper, we innovatively propose a large-scale self-supervised pre-training method to acquire a cardiac structure-aware world model. The core innovation lies in constructing a self-supervised task that requires structural inference by predicting masked structures on a 2D plane and imagining another plane based on pose transformation in 3D space. To support large-scale pre-training, we collected over 1.36 million echocardiograms from ten standard views, along with their 3D spatial poses. In the downstream probe guidance task, we demonstrate that our pre-trained model consistently reduces guidance errors across the ten most common standard views on the test set with 0.29 million samples from 74 routine clinical scans, indicating that structure-aware pre-training benefits the scanning.

Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train

TL;DR

This work tackles the difficulty of acquiring high-quality echocardiograms by creating a structure-aware cardiac world model trained with large-scale self-supervised pre-training. It introduces a 2D-3D joint pre-training framework based on I-JEPA, leveraging 1.36 million image–pose pairs from 364 routine scans to learn both 2D anatomical relationships and 3D spatial plane relationships. The downstream evaluation on probe guidance across ten standard views shows consistent MAE reductions, with up to 4.34% improvement, and ablations demonstrate that combining 2D and 3D objectives yields the best performance. The approach promises to assist novice sonographers and improve scanning efficiency, with potential for broader clinical deployment and validation on additional tasks.

Abstract

The complex structure of the heart leads to significant challenges in echocardiography, especially in acquisition cardiac ultrasound images. Successful echocardiography requires a thorough understanding of the structures on the two-dimensional plane and the spatial relationships between planes in three-dimensional space. In this paper, we innovatively propose a large-scale self-supervised pre-training method to acquire a cardiac structure-aware world model. The core innovation lies in constructing a self-supervised task that requires structural inference by predicting masked structures on a 2D plane and imagining another plane based on pose transformation in 3D space. To support large-scale pre-training, we collected over 1.36 million echocardiograms from ten standard views, along with their 3D spatial poses. In the downstream probe guidance task, we demonstrate that our pre-trained model consistently reduces guidance errors across the ten most common standard views on the test set with 0.29 million samples from 74 routine clinical scans, indicating that structure-aware pre-training benefits the scanning.
Paper Structure (10 sections, 4 equations, 4 figures, 1 table)

This paper contains 10 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Diagram illustrating the capabilities of a cardiac world model. We aim to develop a cardiac world model that can understand both two-dimensional and three-dimensional structures. (Left) The world model needs to recognize various structures in two-dimensional planes and understand their spatial relationships for in-plane probe adjustment. (Right) Understanding the three-dimensional structure of the heart, specifically the spatial relationships between different planes, is crucial for out-of-plane probe adjustment. The images used in the diagram are sourced from mitchell2019guidelines.
  • Figure 2: Diagram illustrating the pre-training method and downstream task. The world model and encoder are required to predict features on the target plane based on the spatial relationships in both two-dimensional and three-dimensional spaces.
  • Figure 3: Anatomic and 2D ultrasound images of ten standard planes. The cardiac images used in the diagram are sourced from mitchell2019guidelines.
  • Figure 4: Ablation of the pre-training objectives. The figure shows the relative change in MAE across six degrees of freedom for ten standard views, comparing different pre-training objectives with Cardiac Dreamer jiang2024cardiac. Smaller values indicate better performance. (a) Our proposed 2D-3D Joint Structure-aware pre-training. (b, c) Pre-training focused only on 2D or 3D structures.