Table of Contents
Fetching ...

ETSM: Automating Dissection Trajectory Suggestion and Confidence Map-Based Safety Margin Prediction for Robot-assisted Endoscopic Submucosal Dissection

Mengya Xu, Wenjin Mo, Guankun Wang, Huxin Gao, An Wang, Long Bai, Chaoyang Lyu, Xiaoxiao Yang, Zhen Li, Hongliang Ren

TL;DR

This work targets safer and more efficient robot-assisted ESD by predicting both optimal dissection trajectories and confidence-based safety margins. It introduces the ETSM dataset, featuring annotated dissection trajectories and margins from dual-arm robotic ESD videos, and develops an AI planning system with a dissection trajectory predictor and a confidence-map-based margin predictor (RCMNet). RCMNet fuses a Transformer-based encoder (DINOv2) with a regression decoder (All-MLP) to produce dense 1-channel safety maps, trained with a weighted MSE that emphasizes margins. Experimental results show strong trajectory prediction with BC, competitive safety-margin prediction across resolutions, robustness to common corruptions, and reasonable out-of-domain generalization, highlighting practical potential for intraoperative guidance and future automation in ESD. The study contributes a first regression-based approach to delineating varying safety levels in dissection areas and lays groundwork for temporally-aware improvements and broader clinical deployment.

Abstract

Robot-assisted Endoscopic Submucosal Dissection (ESD) improves the surgical procedure by providing a more comprehensive view through advanced robotic instruments and bimanual operation, thereby enhancing dissection efficiency and accuracy. Accurate prediction of dissection trajectories is crucial for better decision-making, reducing intraoperative errors, and improving surgical training. Nevertheless, predicting these trajectories is challenging due to variable tumor margins and dynamic visual conditions. To address this issue, we create the ESD Trajectory and Confidence Map-based Safety Margin (ETSM) dataset with $1849$ short clips, focusing on submucosal dissection with a dual-arm robotic system. We also introduce a framework that combines optimal dissection trajectory prediction with a confidence map-based safety margin, providing a more secure and intelligent decision-making tool to minimize surgical risks for ESD procedures. Additionally, we propose the Regression-based Confidence Map Prediction Network (RCMNet), which utilizes a regression approach to predict confidence maps for dissection areas, thereby delineating various levels of safety margins. We evaluate our RCMNet using three distinct experimental setups: in-domain evaluation, robustness assessment, and out-of-domain evaluation. Experimental results show that our approach excels in the confidence map-based safety margin prediction task, achieving a mean absolute error (MAE) of only $3.18$. To the best of our knowledge, this is the first study to apply a regression approach for visual guidance concerning delineating varying safety levels of dissection areas. Our approach bridges gaps in current research by improving prediction accuracy and enhancing the safety of the dissection process, showing great clinical significance in practice.

ETSM: Automating Dissection Trajectory Suggestion and Confidence Map-Based Safety Margin Prediction for Robot-assisted Endoscopic Submucosal Dissection

TL;DR

This work targets safer and more efficient robot-assisted ESD by predicting both optimal dissection trajectories and confidence-based safety margins. It introduces the ETSM dataset, featuring annotated dissection trajectories and margins from dual-arm robotic ESD videos, and develops an AI planning system with a dissection trajectory predictor and a confidence-map-based margin predictor (RCMNet). RCMNet fuses a Transformer-based encoder (DINOv2) with a regression decoder (All-MLP) to produce dense 1-channel safety maps, trained with a weighted MSE that emphasizes margins. Experimental results show strong trajectory prediction with BC, competitive safety-margin prediction across resolutions, robustness to common corruptions, and reasonable out-of-domain generalization, highlighting practical potential for intraoperative guidance and future automation in ESD. The study contributes a first regression-based approach to delineating varying safety levels in dissection areas and lays groundwork for temporally-aware improvements and broader clinical deployment.

Abstract

Robot-assisted Endoscopic Submucosal Dissection (ESD) improves the surgical procedure by providing a more comprehensive view through advanced robotic instruments and bimanual operation, thereby enhancing dissection efficiency and accuracy. Accurate prediction of dissection trajectories is crucial for better decision-making, reducing intraoperative errors, and improving surgical training. Nevertheless, predicting these trajectories is challenging due to variable tumor margins and dynamic visual conditions. To address this issue, we create the ESD Trajectory and Confidence Map-based Safety Margin (ETSM) dataset with short clips, focusing on submucosal dissection with a dual-arm robotic system. We also introduce a framework that combines optimal dissection trajectory prediction with a confidence map-based safety margin, providing a more secure and intelligent decision-making tool to minimize surgical risks for ESD procedures. Additionally, we propose the Regression-based Confidence Map Prediction Network (RCMNet), which utilizes a regression approach to predict confidence maps for dissection areas, thereby delineating various levels of safety margins. We evaluate our RCMNet using three distinct experimental setups: in-domain evaluation, robustness assessment, and out-of-domain evaluation. Experimental results show that our approach excels in the confidence map-based safety margin prediction task, achieving a mean absolute error (MAE) of only . To the best of our knowledge, this is the first study to apply a regression approach for visual guidance concerning delineating varying safety levels of dissection areas. Our approach bridges gaps in current research by improving prediction accuracy and enhancing the safety of the dissection process, showing great clinical significance in practice.

Paper Structure

This paper contains 19 sections, 2 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Overview of our workflow. (a) The dual-arm robotic-based ESD vs. Conventional ESD: The dissection trajectory in the limited endoscopic view during conventional ESD is partial. Dissection trajectory for robot-assisted ESD is more complete. (b) ETSM Dataset Construction: The dataset preprocessing involves video downsampling, frame extraction, and removal of black margins. Data annotation includes marking dissection trajectories with a series of 2D coordinates and annotating safety margins. The ground truth confidence map for the dissection area is generated based on the optimal dissection trajectory and safety margin annotations. (c) AI-powered surgical planning system: includes two functional modules: the dissection trajectory suggestion module and the confidence map-based safety margin prediction module. The output provides intraoperative decision support through visual guidance and may also facilitate future dissection subtasks automation.
  • Figure 2: Case illustrations of confidence generation. (a) The point in the dissection area depends on the distance of its location from the optimal dissection trajectory and edge. (b) The search is easily misled by the other side of the edge when searching for the edge point through the smallest Euclidean distance. (c) Searching for edge points by angular difference can avoid edge mislead while allowing the confidence to transit in a fixed direction. (d) Full view of this case's dissection area confidence. (e) For a curved dissection area, it is necessary to add a threshold on the Euclidean distance between the area point and the edge point.
  • Figure 3: Overview of our RCMNet. The image encoder, leveraging a pre-trained DINOv2, extracts multi-scale feature representations from intermediate transformer layers. Features from each layer are concatenated along the channel dimension and upsampled by a factor of 4. These multi-scale features are then fed into a regression decoder based on an ALL-MLP network. The decoder first aligns channel dimensions via an MLP, upsamples the features back to the input resolution, and fuses them through another MLP. Finally, an MLP confidence prediction head is used to generate a 1-channel confidence map.
  • Figure 4: Our ETSM dataset visualization. Frames are selected from videos of submucosal dissection tasks performed using a dual-arm robotic system.
  • Figure 5: Results visualization. (a) Dissection trajectory suggestion results. Green lines represent the ground truth dissection trajectories, while red lines indicate the predicted trajectories. (b) Confidence map-based safety margin prediction results.
  • ...and 2 more figures