Table of Contents
Fetching ...

VasoMIM: Vascular Anatomy-Aware Masked Image Modeling for Vessel Segmentation

De-Xing Huang, Xiao-Hu Zhou, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Tian-Yu Xiang, Rui-Ze Ma, Nu-Fang Xiao, Zeng-Guang Hou

TL;DR

VasoMIM tackles the problem of vessel segmentation in X-ray angiograms under scarce pixel-level annotations by integrating vascular anatomy into masked image modeling. The method introduces an anatomy-guided masking strategy and an anatomical consistency loss, built on Frangi-derived unsupervised vessel maps, and optimizes with the objective $L_{ m train}=L_{ m rec.}+L_{ m cons.}$. Key design choices include a patch-wise vascular distribution $f(m_i)$, weak-to-strong masking schedule, and a lightweight segmentor trained on Frangi pseudo-labels to enable end-to-end differentiability. Empirical results on three datasets show state-of-the-art performance and strong cross-domain generalization, validating that incorporating anatomical priors into SSL substantially improves vascular representations and downstream segmentation performance.

Abstract

Accurate vessel segmentation in X-ray angiograms is crucial for numerous clinical applications. However, the scarcity of annotated data presents a significant challenge, which has driven the adoption of self-supervised learning (SSL) methods such as masked image modeling (MIM) to leverage large-scale unlabeled data for learning transferable representations. Unfortunately, conventional MIM often fails to capture vascular anatomy because of the severe class imbalance between vessel and background pixels, leading to weak vascular representations. To address this, we introduce Vascular anatomy-aware Masked Image Modeling (VasoMIM), a novel MIM framework tailored for X-ray angiograms that explicitly integrates anatomical knowledge into the pre-training process. Specifically, it comprises two complementary components: anatomy-guided masking strategy and anatomical consistency loss. The former preferentially masks vessel-containing patches to focus the model on reconstructing vessel-relevant regions. The latter enforces consistency in vascular semantics between the original and reconstructed images, thereby improving the discriminability of vascular representations. Empirically, VasoMIM achieves state-of-the-art performance across three datasets. These findings highlight its potential to facilitate X-ray angiogram analysis.

VasoMIM: Vascular Anatomy-Aware Masked Image Modeling for Vessel Segmentation

TL;DR

VasoMIM tackles the problem of vessel segmentation in X-ray angiograms under scarce pixel-level annotations by integrating vascular anatomy into masked image modeling. The method introduces an anatomy-guided masking strategy and an anatomical consistency loss, built on Frangi-derived unsupervised vessel maps, and optimizes with the objective . Key design choices include a patch-wise vascular distribution , weak-to-strong masking schedule, and a lightweight segmentor trained on Frangi pseudo-labels to enable end-to-end differentiability. Empirical results on three datasets show state-of-the-art performance and strong cross-domain generalization, validating that incorporating anatomical priors into SSL substantially improves vascular representations and downstream segmentation performance.

Abstract

Accurate vessel segmentation in X-ray angiograms is crucial for numerous clinical applications. However, the scarcity of annotated data presents a significant challenge, which has driven the adoption of self-supervised learning (SSL) methods such as masked image modeling (MIM) to leverage large-scale unlabeled data for learning transferable representations. Unfortunately, conventional MIM often fails to capture vascular anatomy because of the severe class imbalance between vessel and background pixels, leading to weak vascular representations. To address this, we introduce Vascular anatomy-aware Masked Image Modeling (VasoMIM), a novel MIM framework tailored for X-ray angiograms that explicitly integrates anatomical knowledge into the pre-training process. Specifically, it comprises two complementary components: anatomy-guided masking strategy and anatomical consistency loss. The former preferentially masks vessel-containing patches to focus the model on reconstructing vessel-relevant regions. The latter enforces consistency in vascular semantics between the original and reconstructed images, thereby improving the discriminability of vascular representations. Empirically, VasoMIM achieves state-of-the-art performance across three datasets. These findings highlight its potential to facilitate X-ray angiogram analysis.

Paper Structure

This paper contains 21 sections, 5 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: Comparison of conventional MIM and VasoMIM. (a) Conventional MIM masks patches based on general rules and learns to reconstruct patches via minimizing pixel-level loss. (b) VasoMIM guides patch masking with vascular anatomy and enforces anatomical consistency during reconstruction, enabling the model to learn richer vascular representations. Dark gray patches are vessel-relevant regions.
  • Figure 2: Overall framework of VasoMIM. During pre-training, each X-ray angiogram is first processed by Frangi filter to extract its vascular anatomy. From this anatomy, we derive a patch-wise vascular anatomical distribution $f$ to guide the masking process. Finally, the model is optimized by minimizing $\mathcal{L}_{\rm train}$, which is a combination of standard pixel-wise reconstruction loss $\mathcal{L}_{\rm rec.}$ and the designed anatomical consistency loss $\mathcal{L}_{\rm cons.}$.
  • Figure 3: Some evidence of anatomy-guided masking strategy. (a) Proportion of vessel-containing patches in the masked patches during pre-training. (b) Patch-wise masking ratio over the pre-training process, i.e., $\frac{1}{E}\sum_{j=1}^E\mathbb{I}\left(\text{Patch $x_i$ is masked in epoch $j$}\right)$.
  • Figure 4: Qualitative results on (a) ARCADE, (b) CAXF, and (c) XCAV. Details are zoomed in within blue boxes.
  • Figure 5: In-depth analysis of the masking ratio $\gamma$ on ARCADE. $\gamma$ is set to $0.5$ in our default settings.
  • ...and 6 more figures