Table of Contents
Fetching ...

PDZSeg: Adapting the Foundation Model for Dissection Zone Segmentation with Visual Prompts in Robot-assisted Endoscopic Submucosal Dissection

Mengya Xu, Wenjin Mo, Guankun Wang, Huxin Gao, An Wang, Zhen Li, Xiaoxiao Yang, Hongliang Ren

TL;DR

This is the first study to integrate visual prompt design into dissection zone segmentation, supported by the novel ESD-DZSeg dataset as a benchmark for dissection zone segmentation in ESD.

Abstract

Purpose: Endoscopic surgical environments present challenges for dissection zone segmentation due to unclear boundaries between tissue types, leading to segmentation errors where models misidentify or overlook edges. This study aims to provide precise dissection zone suggestions during endoscopic submucosal dissection (ESD) procedures, enhancing ESD safety. Methods: We propose the Prompted-based Dissection Zone Segmentation (PDZSeg) model, designed to leverage diverse visual prompts such as scribbles and bounding boxes. By overlaying these prompts onto images and fine-tuning a foundational model on a specialized dataset, our approach improves segmentation performance and user experience through flexible input methods. Results: The PDZSeg model was validated using three experimental setups: in-domain evaluation, variability in visual prompt availability, and robustness assessment. Using the ESD-DZSeg dataset, results show that our method outperforms state-of-the-art segmentation approaches. This is the first study to integrate visual prompt design into dissection zone segmentation. Conclusion: The PDZSeg model effectively utilizes visual prompts to enhance segmentation performance and user experience, supported by the novel ESD-DZSeg dataset as a benchmark for dissection zone segmentation in ESD. Our work establishes a foundation for future research.

PDZSeg: Adapting the Foundation Model for Dissection Zone Segmentation with Visual Prompts in Robot-assisted Endoscopic Submucosal Dissection

TL;DR

This is the first study to integrate visual prompt design into dissection zone segmentation, supported by the novel ESD-DZSeg dataset as a benchmark for dissection zone segmentation in ESD.

Abstract

Purpose: Endoscopic surgical environments present challenges for dissection zone segmentation due to unclear boundaries between tissue types, leading to segmentation errors where models misidentify or overlook edges. This study aims to provide precise dissection zone suggestions during endoscopic submucosal dissection (ESD) procedures, enhancing ESD safety. Methods: We propose the Prompted-based Dissection Zone Segmentation (PDZSeg) model, designed to leverage diverse visual prompts such as scribbles and bounding boxes. By overlaying these prompts onto images and fine-tuning a foundational model on a specialized dataset, our approach improves segmentation performance and user experience through flexible input methods. Results: The PDZSeg model was validated using three experimental setups: in-domain evaluation, variability in visual prompt availability, and robustness assessment. Using the ESD-DZSeg dataset, results show that our method outperforms state-of-the-art segmentation approaches. This is the first study to integrate visual prompt design into dissection zone segmentation. Conclusion: The PDZSeg model effectively utilizes visual prompts to enhance segmentation performance and user experience, supported by the novel ESD-DZSeg dataset as a benchmark for dissection zone segmentation in ESD. Our work establishes a foundation for future research.

Paper Structure

This paper contains 11 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview. (a) Robotic ESD procedures were recorded from ex-vivo porcine models utilizing our custom dual-arm robotic platform gao2024transendoscopic. Robotic ESD provides improved visualization of the submucosal layer, leading to a more complete dissection zone compared to conventional ESD. (b) The detailed segmentation zone contour guidance is challenging for doctors to provide in real time. (c) We introduce a model, PDZSeg, specifically developed for segmenting dissection zones and capable of integrating different visual prompts provided by the experienced surgeon, such as scribbles and bounding boxes. Our approach incorporates these visual cues directly onto the images, leveraging fine-tuning of the foundational model. Our model then delivers precise dissection zone contours to the inexperienced surgeon.
  • Figure 2: Our ESD-DZSeg dataset. The blue areas represent the ground truth of the dissection zone. The complexities of the endoscopic scene understanding task can be observed from the dataset. The features across various regions exhibit significant similarity, and the boundaries between these regions are often ambiguous.
  • Figure 3: Results visualization. The first two columns display the ground truth and the segmentation masks predicted by our model. To enhance the comparison, we have extracted the outlines of the segmentation masks, which more clearly highlight the boundaries of the dissection zones. The differently colored contours represent the results obtained from corresponding methods.
  • Figure 4: Corrupted images at a severity level of 3.