Table of Contents
Fetching ...

vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation

Bastian Wittmann, Yannick Wattenberg, Tamaz Amiranashvili, Suprosanna Shit, Bjoern Menze

TL;DR

VesselFM tackles universal 3D blood vessel segmentation by training a foundation model on three heterogeneous data sources: $\mathcal{D}_\text{real}$, $\mathcal{D}_\text{drand}$, and $\mathcal{D}_\text{flow}$, to bridge domain gaps across modalities. It achieves zero-shot generalization to unseen domains and strong one- and few-shot performance across four clinically relevant datasets, leveraging a UNet-based architecture and flow-matching data generation. The approach combines domain randomization with a mask- and class-conditioned flow matching generator to produce large-scale, anatomically coherent image-mask pairs, yielding state-of-the-art Dice and clDice results and robust tubular vessel predictions. Ablation analyses demonstrate the necessity of all three data sources and the flow-matching generator, highlighting vesselFM's practical utility for accelerating vascular imaging research and reducing annotation burden with open-source checkpoints and code.

Abstract

Segmenting 3D blood vessels is a critical yet challenging task in medical image analysis. This is due to significant imaging modality-specific variations in artifacts, vascular patterns and scales, signal-to-noise ratios, and background tissues. These variations, along with domain gaps arising from varying imaging protocols, limit the generalization of existing supervised learning-based methods, requiring tedious voxel-level annotations for each dataset separately. While foundation models promise to alleviate this limitation, they typically fail to generalize to the task of blood vessel segmentation, posing a unique, complex problem. In this work, we present vesselFM, a foundation model designed specifically for the broad task of 3D blood vessel segmentation. Unlike previous models, vesselFM can effortlessly generalize to unseen domains. To achieve zero-shot generalization, we train vesselFM on three heterogeneous data sources: a large, curated annotated dataset, data generated by a domain randomization scheme, and data sampled from a flow matching-based generative model. Extensive evaluations show that vesselFM outperforms state-of-the-art medical image segmentation foundation models across four (pre-)clinically relevant imaging modalities in zero-, one-, and few-shot scenarios, therefore providing a universal solution for 3D blood vessel segmentation.

vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation

TL;DR

VesselFM tackles universal 3D blood vessel segmentation by training a foundation model on three heterogeneous data sources: , , and , to bridge domain gaps across modalities. It achieves zero-shot generalization to unseen domains and strong one- and few-shot performance across four clinically relevant datasets, leveraging a UNet-based architecture and flow-matching data generation. The approach combines domain randomization with a mask- and class-conditioned flow matching generator to produce large-scale, anatomically coherent image-mask pairs, yielding state-of-the-art Dice and clDice results and robust tubular vessel predictions. Ablation analyses demonstrate the necessity of all three data sources and the flow-matching generator, highlighting vesselFM's practical utility for accelerating vascular imaging research and reducing annotation burden with open-source checkpoints and code.

Abstract

Segmenting 3D blood vessels is a critical yet challenging task in medical image analysis. This is due to significant imaging modality-specific variations in artifacts, vascular patterns and scales, signal-to-noise ratios, and background tissues. These variations, along with domain gaps arising from varying imaging protocols, limit the generalization of existing supervised learning-based methods, requiring tedious voxel-level annotations for each dataset separately. While foundation models promise to alleviate this limitation, they typically fail to generalize to the task of blood vessel segmentation, posing a unique, complex problem. In this work, we present vesselFM, a foundation model designed specifically for the broad task of 3D blood vessel segmentation. Unlike previous models, vesselFM can effortlessly generalize to unseen domains. To achieve zero-shot generalization, we train vesselFM on three heterogeneous data sources: a large, curated annotated dataset, data generated by a domain randomization scheme, and data sampled from a flow matching-based generative model. Extensive evaluations show that vesselFM outperforms state-of-the-art medical image segmentation foundation models across four (pre-)clinically relevant imaging modalities in zero-, one-, and few-shot scenarios, therefore providing a universal solution for 3D blood vessel segmentation.

Paper Structure

This paper contains 36 sections, 3 equations, 16 figures, 7 tables, 1 algorithm.

Figures (16)

  • Figure 1: VesselFM is trained in a supervised manner on image-mask pairs from three heterogeneous data sources ($\mathcal{D}_\text{real}$, $\mathcal{D}_\text{drand}$, and $\mathcal{D}_\text{flow}$) and subsequently applied in a zero-, one-, or few-shot fashion to new, unseen 3D blood vessel domains.
  • Figure 2: Schematic distributions of our three data sources $\mathcal{D}_\text{real}$ (shades of blue), $\mathcal{D}_\text{flow}$ (red), and $\mathcal{D}_\text{drand}$ (gray). While we aim to comprehensively cover the general domain of 3D vascular images with $\mathcal{D}_\text{drand}$, $\mathcal{D}_\text{flow}$ effectively broadens the distributions of $\mathcal{D}_\text{real}$. Note that segmentation masks are shown in translucent red in the exemplary images.
  • Figure 3: Slices of images $\mathcal{X}_\text{real}$ from $\mathcal{D}_\text{real}$. $\mathcal{D}_\text{real}$ contains vascular images of shape $\text{128}^\text{3}$ with matching voxel-level annotations collected from 23 datasets (classes are indicated in red) of diverse imaging modalities, depicting a wide range of anatomical regions.
  • Figure 4: a) Schematic overview of our domain randomized generative pipeline used to generate $\mathcal{D}_\text{drand}$ = $\{\mathcal{X}_\text{drand}, \mathcal{M}_\text{syn}\}$. We specifically highlight its three main components: foreground generation, background generation, and merging. Note that we indicate instances forwarded to the subsequent step in the color red for illustration purposes. b) Slices of exemplary images $\mathcal{X}_\text{drand}$, categorized as $c = 0$. The wide variety of generated, highly diverse images showcases the effectiveness of our proposed domain randomization strategy.
  • Figure 5: a) Sampling of synthetic images $\mathcal{X}_\text{flow}$ via our mask- and class-conditioned flow matching-based generative model. We explicitly show our sampling scheme, mapping a sample $x_{0} \sim \mathcal{N}(0, I)$ to an exemplary sample $x_{1}$ of class $\Tilde{21}$. In addition, we present a more detailed trajectory, which is for improved visibility plotted in 2D. b) Slices of exemplary images $\mathcal{X}_\text{flow}$, sampled from our generative model. Note that all of the depicted slices are conditioned on the same mask, and we solely vary the class. We would like to emphasize that our generative model is able to produce synthetic images almost indistinguishable from real images (compare with Fig. \ref{['fig:method_real']}).
  • ...and 11 more figures