Table of Contents
Fetching ...

Generalizing Segmentation Foundation Model Under Sim-to-real Domain-shift for Guidewire Segmentation in X-ray Fluoroscopy

Yuxuan Wen, Evgenia Roussinova, Olivier Brina, Paolo Machi, Mohamed Bouri

TL;DR

This work proposes a sim-to-real domain adaption framework with a coarse-to-fine strategy to adapt SAM to X-ray fluoroscopy guidewire segmentation without any annotation on the target domain, and develops a weakly supervised self-training architecture to fine-tune an end-to-end student SAM with the coarse labels.

Abstract

Guidewire segmentation during endovascular interventions holds the potential to significantly enhance procedural accuracy, improving visualization and providing critical feedback that can support both physicians and robotic systems in navigating complex vascular pathways. Unlike supervised segmentation networks, which need many expensive expert-annotated labels, sim-to-real domain adaptation approaches utilize synthetic data from simulations, offering a cost-effective solution. The success of models like Segment-Anything (SAM) has driven advancements in image segmentation foundation models with strong zero/few-shot generalization through prompt engineering. However, they struggle with medical images like X-ray fluoroscopy and the domain-shifts of the data. Given the challenges of acquiring annotation and the accessibility of labeled simulation data, we propose a sim-to-real domain adaption framework with a coarse-to-fine strategy to adapt SAM to X-ray fluoroscopy guidewire segmentation without any annotation on the target domain. We first generate the pseudo-labels by utilizing a simple source image style transfer technique that preserves the guidewire structure. Then, we develop a weakly supervised self-training architecture to fine-tune an end-to-end student SAM with the coarse labels by imposing consistency regularization and supervision from the teacher SAM network. We validate the effectiveness of the proposed method on a publicly available Cardiac dataset and an in-house Neurovascular dataset, where our method surpasses both pre-trained SAM and many state-of-the-art domain adaptation techniques by a large margin. Our code will be made public on GitHub soon.

Generalizing Segmentation Foundation Model Under Sim-to-real Domain-shift for Guidewire Segmentation in X-ray Fluoroscopy

TL;DR

This work proposes a sim-to-real domain adaption framework with a coarse-to-fine strategy to adapt SAM to X-ray fluoroscopy guidewire segmentation without any annotation on the target domain, and develops a weakly supervised self-training architecture to fine-tune an end-to-end student SAM with the coarse labels.

Abstract

Guidewire segmentation during endovascular interventions holds the potential to significantly enhance procedural accuracy, improving visualization and providing critical feedback that can support both physicians and robotic systems in navigating complex vascular pathways. Unlike supervised segmentation networks, which need many expensive expert-annotated labels, sim-to-real domain adaptation approaches utilize synthetic data from simulations, offering a cost-effective solution. The success of models like Segment-Anything (SAM) has driven advancements in image segmentation foundation models with strong zero/few-shot generalization through prompt engineering. However, they struggle with medical images like X-ray fluoroscopy and the domain-shifts of the data. Given the challenges of acquiring annotation and the accessibility of labeled simulation data, we propose a sim-to-real domain adaption framework with a coarse-to-fine strategy to adapt SAM to X-ray fluoroscopy guidewire segmentation without any annotation on the target domain. We first generate the pseudo-labels by utilizing a simple source image style transfer technique that preserves the guidewire structure. Then, we develop a weakly supervised self-training architecture to fine-tune an end-to-end student SAM with the coarse labels by imposing consistency regularization and supervision from the teacher SAM network. We validate the effectiveness of the proposed method on a publicly available Cardiac dataset and an in-house Neurovascular dataset, where our method surpasses both pre-trained SAM and many state-of-the-art domain adaptation techniques by a large margin. Our code will be made public on GitHub soon.

Paper Structure

This paper contains 24 sections, 9 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Problem formulation and method overview. (a) The setups in the source domain and target domain. (b) A general framework of the proposed coarse-to-fine sim-to-real domain adaptation method, where we generate the pseudo-labels in the coarse stage and utilize weak supervision and self-training to train a student-teacher SAM network. The "fire" symbol means the trainable parameters, while the "snowflake" symbol means the parameters are frozen without updating.
  • Figure 2: The proposed sim-to-real adaptation framework, which contains a coarse stage ((a)-(c)) that prepares the pseudo-labels for weak supervision and a fine stage (d) that generates the final prediction by finetuning a student-teacher network with self-training losses. "Weak prompts" are generated from pseudo-labels in the form of either point or box during training. The two types of prompts, as well as their combination, were evaluated (see Section \ref{['IV.F.3']}). "Tea" means the teacher SAM and "Stu" means the student SAM. The "fire" symbol means the trainable parameters, while the "snowflake" symbol means the parameters are frozen without updating.
  • Figure 3: Visual comparison on the Cardiac dataset. (a) is the original X-ray fluoroscopy image, (b) is the corresponding label, (c), (d), (e), and (f) are the prediction results from SAM kirillov2023segment with box prompt, Direct Transfer, our end-to-end student model, and our teacher model with box prompt, respectively.
  • Figure 4: Visual comparison on the Neurovascular dataset. (a) is the original X-ray fluoroscopy image, (b) is the corresponding label, (c), (d), (e), and (f) are the prediction results from SAM kirillov2023segment with box prompt, Direct Transfer, our end-to-end student model, and our teacher model with box prompt, respectively.
  • Figure 5: Visualization of the learning process and the self-training strategy. The evolution of the IoU scores highlights the convergence of both the teacher and student network in the fine stage. The "fire" symbol means the trainable parameters, while the "snowflake" symbol means the parameters are frozen without updating. We froze the LoRA parameters in the image encoder of the teacher model after warmup to prevent the negative impact of self-training.
  • ...and 1 more figures