Table of Contents
Fetching ...

AVP-AP: Self-supervised Automatic View Positioning in 3D cardiac CT via Atlas Prompting

Xiaolin Fan, Yan Wang, Yingying Zhang, Mingkun Bao, Bosen Jia, Dong Lu, Yifan Gu, Jian Cheng, Haogang Zhu

TL;DR

This work tackles automatic view positioning for 3D cardiac CT by introducing AVP-AP, a self-supervised, atlas prompting framework. A canonical 3D atlas is generated from co-registered CT volumes, and PosNet learns to map arbitrary 2D slices into atlas space; a coarse-position estimate is then obtained by rigidly aligning the atlas to the target volume, followed by a fine-stage refinement using foundation-model feature similarity. The method consistently outperforms traditional optimization-based and prior learning-based approaches in SSIM across arbitrary views, and achieves results competitive with radiologists while reducing positioning time; external validation confirms generalization to new datasets. The approach is robust to arbitrary slice orientations and can be extended to other organs or modalities with potential multimodal prompts and real-time deployment. Key contributions include the atlas prompting paradigm, the two-stage coarse-to-fine positioning pipeline, and thorough ablations demonstrating the importance of atlas alignment, backbone choice, and label representations.

Abstract

Automatic view positioning is crucial for cardiac computed tomography (CT) examinations, including disease diagnosis and surgical planning. However, it is highly challenging due to individual variability and large 3D search space. Existing work needs labor-intensive and time-consuming manual annotations to train view-specific models, which are limited to predicting only a fixed set of planes. However, in real clinical scenarios, the challenge of positioning semantic 2D slices with any orientation into varying coordinate space in arbitrary 3D volume remains unsolved. We thus introduce a novel framework, AVP-AP, the first to use Atlas Prompting for self-supervised Automatic View Positioning in the 3D CT volume. Specifically, this paper first proposes an atlas prompting method, which generates a 3D canonical atlas and trains a network to map slices into their corresponding positions in the atlas space via a self-supervised manner. Then, guided by atlas prompts corresponding to the given query images in a reference CT, we identify the coarse positions of slices in the target CT volume using rigid transformation between the 3D atlas and target CT volume, effectively reducing the search space. Finally, we refine the coarse positions by maximizing the similarity between the predicted slices and the query images in the feature space of a given foundation model. Our framework is flexible and efficient compared to other methods, outperforming other methods by 19.8% average structural similarity (SSIM) in arbitrary view positioning and achieving 9% SSIM in two-chamber view compared to four radiologists. Meanwhile, experiments on a public dataset validate our framework's generalizability.

AVP-AP: Self-supervised Automatic View Positioning in 3D cardiac CT via Atlas Prompting

TL;DR

This work tackles automatic view positioning for 3D cardiac CT by introducing AVP-AP, a self-supervised, atlas prompting framework. A canonical 3D atlas is generated from co-registered CT volumes, and PosNet learns to map arbitrary 2D slices into atlas space; a coarse-position estimate is then obtained by rigidly aligning the atlas to the target volume, followed by a fine-stage refinement using foundation-model feature similarity. The method consistently outperforms traditional optimization-based and prior learning-based approaches in SSIM across arbitrary views, and achieves results competitive with radiologists while reducing positioning time; external validation confirms generalization to new datasets. The approach is robust to arbitrary slice orientations and can be extended to other organs or modalities with potential multimodal prompts and real-time deployment. Key contributions include the atlas prompting paradigm, the two-stage coarse-to-fine positioning pipeline, and thorough ablations demonstrating the importance of atlas alignment, backbone choice, and label representations.

Abstract

Automatic view positioning is crucial for cardiac computed tomography (CT) examinations, including disease diagnosis and surgical planning. However, it is highly challenging due to individual variability and large 3D search space. Existing work needs labor-intensive and time-consuming manual annotations to train view-specific models, which are limited to predicting only a fixed set of planes. However, in real clinical scenarios, the challenge of positioning semantic 2D slices with any orientation into varying coordinate space in arbitrary 3D volume remains unsolved. We thus introduce a novel framework, AVP-AP, the first to use Atlas Prompting for self-supervised Automatic View Positioning in the 3D CT volume. Specifically, this paper first proposes an atlas prompting method, which generates a 3D canonical atlas and trains a network to map slices into their corresponding positions in the atlas space via a self-supervised manner. Then, guided by atlas prompts corresponding to the given query images in a reference CT, we identify the coarse positions of slices in the target CT volume using rigid transformation between the 3D atlas and target CT volume, effectively reducing the search space. Finally, we refine the coarse positions by maximizing the similarity between the predicted slices and the query images in the feature space of a given foundation model. Our framework is flexible and efficient compared to other methods, outperforming other methods by 19.8% average structural similarity (SSIM) in arbitrary view positioning and achieving 9% SSIM in two-chamber view compared to four radiologists. Meanwhile, experiments on a public dataset validate our framework's generalizability.

Paper Structure

This paper contains 30 sections, 13 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: An overview of view planning and view positioning. (a) View planning: given a target volume, the view-specific model is trained to predict limited target 2D slices. When specific slices differ, the model needs to be retrained. (b) View positioning: for a target volume, given arbitrary query images in a reference CT, our framework can identify the most similar target 2D slices and their corresponding positions without the need for retraining.
  • Figure 2: Our framework, AVP-AP, for self-supervised Automatic View Positioning in 3D cardiac CT volume via Atlas Prompting. For a target CT volume $\mathcal{V}$, given a query image $q$ in a reference CT, we first obtain the position $\hat{p}_q^\mathcal{R}$ of the query image in the atlas space using PosNet. Then, the atlas $\mathcal{R}$ and the position $\hat{p}_q^\mathcal{R}$ form the atlas prompt. Next, guided by the atlas prompt, the coarse position $\hat{p}_q^\mathcal{V}$ of the query image in the target CT volume is identified using rigid transformation between the atlas space and target CT volume. Finally, by maximizing the similarity between the predicted slices and the query images in the feature space of a given foundation model, the target slice $s_q^\mathcal{V}$ and its fine position $p_q^\mathcal{V}$ in the target CT volume is determined. For the atlas prompting method, it first unifies a coordinate system space and generates a 3D canonical atlas $\mathcal{R}$ with a given group of 3D CT volumes. Then, 2D image slices $\{s_i^{\mathcal{R}}\}_{i=1}^n$ resampled from all co-aligned 3D CT volumes are utilized to train a regression model PosNet to predict the positions $\{p_i^{\mathcal{R}}\}_{i=1}^n$ in atlas $\mathcal{R}$ via self-supervised learning.
  • Figure 3: The coordinate system, normal sampling, and positioning and searching of slices. (a) The coordinate system and Three-Point label representation of 2D slice self in 3D CT volume space. (b) Slice plane normals $\Omega$ w.r.t. the origin resampled by FSS. (c) Multi-iteration resampling normals w.r.t. the origin at the normal of the coarse position $\hat{p}_q^{\mathcal{V}}$ (i.e., blue arrow)
  • Figure 4: Internal testing: visualization of results from three different methods. The first column contains four query images: 2C, 4C, Y, and RV$_1$, not from the target CT volume. The second and third columns are the slices localized by Lea-AVP and the radiologist, while the fourth and fifth columns show the results of coarse and fine positioning by our method, with "Ours-coarse" representing the coarse positioning and "Ours-fine" indicating the fine positioning. The last column shows the comparison of positions. Green is our position, red is the result labeled by the radiologist, and yellow is the result located by Lea-AVP.
  • Figure 5: Internal testing: comparison of positioning times between our framework AVP-AP and radiologists. The less time spent, the better. (a), (b), (c), and (d) represent the comparison of positioning time for four different types of query images: 2C, 4C, Y, and RV$_1$. Rad1 represents the first radiologist, the same as below.
  • ...and 3 more figures