Table of Contents
Fetching ...

Vascular anatomy-aware self-supervised pre-training for X-ray angiogram analysis

De-Xing Huang, Chaohui Yu, Xiao-Hu Zhou, Tian-Yu Xiang, Qin-Yi Zhang, Mei-Jiang Gui, Rui-Ze Ma, Chen-Yu Wang, Nu-Fang Xiao, Fan Wang, Zeng-Guang Hou

TL;DR

This work tackles the limited annotated data problem in X-ray angiogram analysis by introducing VasoMIM, a self-supervised pre-training framework that injects vascular anatomical knowledge into masked image modeling. VasoMIM combines an anatomy-guided masking strategy, leveraging Frangi-based vessel extraction with a co-guidance segmentor, and an anatomical consistency loss to preserve vascular topology during reconstruction. The authors curate XA-170K, the largest publicly available X-ray angiogram pre-training dataset, and demonstrate state-of-the-art performance across four downstream tasks and six datasets, along with comprehensive ablations and scaling analyses. The results indicate that domain-specific SSL can yield highly transferable vascular representations, enabling powerful foundation-model-like capabilities for X-ray angiogram analysis and reducing annotation burden in clinical settings.

Abstract

X-ray angiography is the gold standard imaging modality for cardiovascular diseases. However, current deep learning approaches for X-ray angiogram analysis are severely constrained by the scarcity of annotated data. While large-scale self-supervised learning (SSL) has emerged as a promising solution, its potential in this domain remains largely unexplored, primarily due to the lack of effective SSL frameworks and large-scale datasets. To bridge this gap, we introduce a vascular anatomy-aware masked image modeling (VasoMIM) framework that explicitly integrates domain-specific anatomical knowledge. Specifically, VasoMIM comprises two key designs: an anatomy-guided masking strategy and an anatomical consistency loss. The former strategically masks vessel-containing patches to compel the model to learn robust vascular semantics, while the latter preserves structural consistency of vessels between original and reconstructed images, enhancing the discriminability of the learned representations. In conjunction with VasoMIM, we curate XA-170K, the largest X-ray angiogram pre-training dataset to date. We validate VasoMIM on four downstream tasks across six datasets, where it demonstrates superior transferability and achieves state-of-the-art performance compared to existing methods. These findings highlight the significant potential of VasoMIM as a foundation model for advancing a wide range of X-ray angiogram analysis tasks. VasoMIM and XA-170K will be available at https://github.com/Dxhuang-CASIA/XA-SSL.

Vascular anatomy-aware self-supervised pre-training for X-ray angiogram analysis

TL;DR

This work tackles the limited annotated data problem in X-ray angiogram analysis by introducing VasoMIM, a self-supervised pre-training framework that injects vascular anatomical knowledge into masked image modeling. VasoMIM combines an anatomy-guided masking strategy, leveraging Frangi-based vessel extraction with a co-guidance segmentor, and an anatomical consistency loss to preserve vascular topology during reconstruction. The authors curate XA-170K, the largest publicly available X-ray angiogram pre-training dataset, and demonstrate state-of-the-art performance across four downstream tasks and six datasets, along with comprehensive ablations and scaling analyses. The results indicate that domain-specific SSL can yield highly transferable vascular representations, enabling powerful foundation-model-like capabilities for X-ray angiogram analysis and reducing annotation burden in clinical settings.

Abstract

X-ray angiography is the gold standard imaging modality for cardiovascular diseases. However, current deep learning approaches for X-ray angiogram analysis are severely constrained by the scarcity of annotated data. While large-scale self-supervised learning (SSL) has emerged as a promising solution, its potential in this domain remains largely unexplored, primarily due to the lack of effective SSL frameworks and large-scale datasets. To bridge this gap, we introduce a vascular anatomy-aware masked image modeling (VasoMIM) framework that explicitly integrates domain-specific anatomical knowledge. Specifically, VasoMIM comprises two key designs: an anatomy-guided masking strategy and an anatomical consistency loss. The former strategically masks vessel-containing patches to compel the model to learn robust vascular semantics, while the latter preserves structural consistency of vessels between original and reconstructed images, enhancing the discriminability of the learned representations. In conjunction with VasoMIM, we curate XA-170K, the largest X-ray angiogram pre-training dataset to date. We validate VasoMIM on four downstream tasks across six datasets, where it demonstrates superior transferability and achieves state-of-the-art performance compared to existing methods. These findings highlight the significant potential of VasoMIM as a foundation model for advancing a wide range of X-ray angiogram analysis tasks. VasoMIM and XA-170K will be available at https://github.com/Dxhuang-CASIA/XA-SSL.
Paper Structure (22 sections, 5 equations, 10 figures, 10 tables)

This paper contains 22 sections, 5 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Conceptual comparison. (a) Conventional MIM masks patches based on generic rules and utilizes standard supervision to reconstruct the original image. (b) VasoMIM employs anatomical guidance to selectively mask vessel-relevant regions (highlighted in yellow) and enforces structural consistency via anatomical supervision, enabling the model to learn richer vascular representations.
  • Figure 2: Generic vs. anatomy-guided masking strategies. While generic methods like AMT and HPM fail to prioritize vascular structures, VasoMIM effectively focuses on vessel-relevant regions. Red denotes a higher masking probability, while blue indicates the opposite.
  • Figure 3: Overview of VasoMIM. First, vascular anatomy is extracted from the input X-ray angiogram via Frangi filter. A patch-wise vascular anatomical distribution $f(g_i)$ is then computed to guide the masking strategy, prioritizing vessel-relevant regions. Finally, the model is optimized by minimizing the total objective $\mathcal{L}_{\rm MIM}$, which combines the standard pixel-level reconstruction loss $\mathcal{L}_{\rm rec.}$ with the proposed anatomical consistency loss $\mathcal{L}_{\rm cons.}$ to learn discriminative vascular representations.
  • Figure 4: Overview of downstream tasks.
  • Figure 5: Qualitative analysis of the anatomy-guided masking strategy. Left: The proportion of vessel-containing patches among all masked patches across training epochs. Right: Visual comparison. We highlight failure cases of Frangi filter where UNeXt-S, leveraging strong inductive bias, extracts vascular anatomy more accurately to provide precise guidance. The heatmaps display the patch-wise masking ratio, i.e., $\frac{1}{E}\sum_{j=1}^E\mathbb{I}\left(\text{Patch $x_i$ is masked in epoch $j$}\right)$.
  • ...and 5 more figures