VasoMIM: Vascular Anatomy-Aware Masked Image Modeling for Vessel Segmentation
De-Xing Huang, Xiao-Hu Zhou, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Tian-Yu Xiang, Rui-Ze Ma, Nu-Fang Xiao, Zeng-Guang Hou
TL;DR
VasoMIM tackles the problem of vessel segmentation in X-ray angiograms under scarce pixel-level annotations by integrating vascular anatomy into masked image modeling. The method introduces an anatomy-guided masking strategy and an anatomical consistency loss, built on Frangi-derived unsupervised vessel maps, and optimizes with the objective $L_{ m train}=L_{ m rec.}+L_{ m cons.}$. Key design choices include a patch-wise vascular distribution $f(m_i)$, weak-to-strong masking schedule, and a lightweight segmentor trained on Frangi pseudo-labels to enable end-to-end differentiability. Empirical results on three datasets show state-of-the-art performance and strong cross-domain generalization, validating that incorporating anatomical priors into SSL substantially improves vascular representations and downstream segmentation performance.
Abstract
Accurate vessel segmentation in X-ray angiograms is crucial for numerous clinical applications. However, the scarcity of annotated data presents a significant challenge, which has driven the adoption of self-supervised learning (SSL) methods such as masked image modeling (MIM) to leverage large-scale unlabeled data for learning transferable representations. Unfortunately, conventional MIM often fails to capture vascular anatomy because of the severe class imbalance between vessel and background pixels, leading to weak vascular representations. To address this, we introduce Vascular anatomy-aware Masked Image Modeling (VasoMIM), a novel MIM framework tailored for X-ray angiograms that explicitly integrates anatomical knowledge into the pre-training process. Specifically, it comprises two complementary components: anatomy-guided masking strategy and anatomical consistency loss. The former preferentially masks vessel-containing patches to focus the model on reconstructing vessel-relevant regions. The latter enforces consistency in vascular semantics between the original and reconstructed images, thereby improving the discriminability of vascular representations. Empirically, VasoMIM achieves state-of-the-art performance across three datasets. These findings highlight its potential to facilitate X-ray angiogram analysis.
