Table of Contents
Fetching ...

Assessing the generalization performance of SAM for ureteroscopy scene understanding

Martin Villagrana, Francisco Lopez-Tiro, Clement Larose, Gilberto Ochoa-Ruiz, Christian Daul

TL;DR

This work addresses the challenge of robust kidney stone segmentation in ureteroscopy images by evaluating the Segment Anything Model (SAM) against traditional U‑Net variants. Using four diverse datasets and two- and three-class configurations, SAM is trained on a single distribution and tested on both in-distribution and out-of-distribution data, demonstrating competitive in-distribution performance ($\text{Accuracy}=97.68\pm3.04$, $\text{Dice}=97.78\pm2.47$, $\text{IoU}=95.76\pm4.18$) while delivering significantly stronger generalization on unseen data (outperforming all U‑Net variants by up to $23\%$ IoU). A two-phase SAM setup enables effective three-class segmentation (stone, laser fiber, tissue) without retraining, achieving high cross-domain accuracy. Overall, SAM exhibits precise, artifact-free segmentations and robust cross-dataset transfer, signaling strong potential for scalable clinical image analysis in variable ureteroscopic environments.

Abstract

The segmentation of kidney stones is regarded as a critical preliminary step to enable the identification of urinary stone types through machine- or deep-learning-based approaches. In urology, manual segmentation is considered tedious and impractical due to the typically large scale of image databases and the continuous generation of new data. In this study, the potential of the Segment Anything Model (SAM) -- a state-of-the-art deep learning framework -- is investigated for the automation of kidney stone segmentation. The performance of SAM is evaluated in comparison to traditional models, including U-Net, Residual U-Net, and Attention U-Net, which, despite their efficiency, frequently exhibit limitations in generalizing to unseen datasets. The findings highlight SAM's superior adaptability and efficiency. While SAM achieves comparable performance to U-Net on in-distribution data (Accuracy: 97.68 + 3.04; Dice: 97.78 + 2.47; IoU: 95.76 + 4.18), it demonstrates significantly enhanced generalization capabilities on out-of-distribution data, surpassing all U-Net variants by margins of up to 23 percent.

Assessing the generalization performance of SAM for ureteroscopy scene understanding

TL;DR

This work addresses the challenge of robust kidney stone segmentation in ureteroscopy images by evaluating the Segment Anything Model (SAM) against traditional U‑Net variants. Using four diverse datasets and two- and three-class configurations, SAM is trained on a single distribution and tested on both in-distribution and out-of-distribution data, demonstrating competitive in-distribution performance (, , ) while delivering significantly stronger generalization on unseen data (outperforming all U‑Net variants by up to IoU). A two-phase SAM setup enables effective three-class segmentation (stone, laser fiber, tissue) without retraining, achieving high cross-domain accuracy. Overall, SAM exhibits precise, artifact-free segmentations and robust cross-dataset transfer, signaling strong potential for scalable clinical image analysis in variable ureteroscopic environments.

Abstract

The segmentation of kidney stones is regarded as a critical preliminary step to enable the identification of urinary stone types through machine- or deep-learning-based approaches. In urology, manual segmentation is considered tedious and impractical due to the typically large scale of image databases and the continuous generation of new data. In this study, the potential of the Segment Anything Model (SAM) -- a state-of-the-art deep learning framework -- is investigated for the automation of kidney stone segmentation. The performance of SAM is evaluated in comparison to traditional models, including U-Net, Residual U-Net, and Attention U-Net, which, despite their efficiency, frequently exhibit limitations in generalizing to unseen datasets. The findings highlight SAM's superior adaptability and efficiency. While SAM achieves comparable performance to U-Net on in-distribution data (Accuracy: 97.68 + 3.04; Dice: 97.78 + 2.47; IoU: 95.76 + 4.18), it demonstrates significantly enhanced generalization capabilities on out-of-distribution data, surpassing all U-Net variants by margins of up to 23 percent.

Paper Structure

This paper contains 14 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Comparison framework for kidney stone segmentation methods. In traditional approaches (top), frames are extracted from ureteroscopy videos to create Dataset $D{_i}$ (distribution i), followed by manual expert labeling. The Segment Anything Model pipeline (bottom) is trained on i and its labels, then performs inference on a distinct unlabeled Dataset j (distribution j). Critical note: $D{_i}$ and $D{_i}$ represent different data distributions.
  • Figure 2: The four datasets (distributions) of kidney stone images are displayed alongside their corresponding segmentation masks, which include the stone and laser fiber. From right to left: Dataset A (in vivo endoscopic), Dataset B (ex vivo endoscopic), Dataset C (in vivo endoscopic), and Dataset D (ex vivo CCD camera). All datasets enable two-class segmentation (kidney stone and tissue), while only Datasets A and C include a third class (laser).
  • Figure 3: A qualitative comparison is presented across rows (Datasets A-D) and columns (kidney stone image, ground truth mask, U-Net, Residual U-Net, Attention U-Net, and SAM), where results are color-coded as: blue (true positives/correct segmentation), red (oversegmentation/false positives), and green (undersegmentation/false negatives).
  • Figure 4: Qualitative comparison of segmentation results for three classes (kidney stone, laser, and surrounding tissue). From left to right: Kidney stone image, segmentation mask (ground truth), and the prediction generated by the SAM model trained on Dataset C. The first row corresponds to in-distribution results, while the second represents out-of-distribution performance.