Table of Contents
Fetching ...

SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge

Dimitrios Psychogyios, Emanuele Colleoni, Beatrice Van Amsterdam, Chih-Yang Li, Shu-Yu Huang, Yuchong Li, Fucang Jia, Baosheng Zou, Guotai Wang, Yang Liu, Maxence Boels, Jiayu Huo, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin, Mengya Xu, An Wang, Yanan Wu, Long Bai, Hongliang Ren, Atsushi Yamada, Yuriko Harai, Yuto Ishikawa, Kazuyuki Hayashi, Jente Simoens, Pieter DeBacker, Francesco Cisternino, Gabriele Furnari, Alex Mottrie, Federica Ferraguti, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Soohee Kim, Seung Hyun Lee, Kyu Eun Lee, Hyoun-Joong Kong, Kui Fu, Chao Li, Shan An, Stefanie Krell, Sebastian Bodenstedt, Nicolas Ayobi, Alejandra Perez, Santiago Rodriguez, Juanita Puentes, Pablo Arbelaez, Omid Mohareri, Danail Stoyanov

TL;DR

SAR-RARP50 introduces a multimodal, in-vivo dataset for robotic radical prostatectomy covering both action recognition and instrument segmentation, with three evaluation tasks and standardized metrics $FWA$, $F1@10$, $mIoU$, $mNSD$, and final scores $Score_{ar}$, $Score_{s}$, and $Score_{mt}$. The challenge assesses single-task and multitask learning using real suturing videos, encouraging temporal attention-based architectures and cross-task information sharing. Across 12 teams, attention-based, two-stage approaches achieved top performance in both tasks, while multitask results were mixed, leaving the benefits of joint optimization inconclusive. The dataset’s real-world variability and comprehensive annotations motivate robust surgical AI systems and pave the way for future multi-modal, cross-task research in real surgeries.

Abstract

Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segmentation algorithms are often trained and make predictions in isolation from each other, without exploiting potential cross-task relationships. With the EndoVis 2022 SAR-RARP50 challenge, we release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP). The aim of the challenge is twofold. First, to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. Second, to further explore the potential of multitask-based learning approaches and determine their comparative advantage against their single-task counterparts. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation. The complete SAR-RARP50 dataset is available at: https://rdr.ucl.ac.uk/projects/SARRARP50_Segmentation_of_surgical_instrumentation_and_Action_Recognition_on_Robot-Assisted_Radical_Prostatectomy_Challenge/191091

SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge

TL;DR

SAR-RARP50 introduces a multimodal, in-vivo dataset for robotic radical prostatectomy covering both action recognition and instrument segmentation, with three evaluation tasks and standardized metrics , , , , and final scores , , and . The challenge assesses single-task and multitask learning using real suturing videos, encouraging temporal attention-based architectures and cross-task information sharing. Across 12 teams, attention-based, two-stage approaches achieved top performance in both tasks, while multitask results were mixed, leaving the benefits of joint optimization inconclusive. The dataset’s real-world variability and comprehensive annotations motivate robust surgical AI systems and pave the way for future multi-modal, cross-task research in real surgeries.

Abstract

Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segmentation algorithms are often trained and make predictions in isolation from each other, without exploiting potential cross-task relationships. With the EndoVis 2022 SAR-RARP50 challenge, we release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP). The aim of the challenge is twofold. First, to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. Second, to further explore the potential of multitask-based learning approaches and determine their comparative advantage against their single-task counterparts. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation. The complete SAR-RARP50 dataset is available at: https://rdr.ucl.ac.uk/projects/SARRARP50_Segmentation_of_surgical_instrumentation_and_Action_Recognition_on_Robot-Assisted_Radical_Prostatectomy_Challenge/191091
Paper Structure (36 sections, 17 equations, 7 figures, 16 tables)

This paper contains 36 sections, 17 equations, 7 figures, 16 tables.

Figures (7)

  • Figure 1: RARP-45 statistics. (a) Task duration variability (reported in seconds). The average duration is about 5 minutes, with large variability ranging from about 2 to 12 minutes. (b) Class distribution per sequence. Each bin represents the median class frequency over interventions, and error bars mark the 25th and 75th quantiles. Class G5 is absent in more than 75% of the interventions.
  • Figure 2: Corner-cases of our semantic segmentation protocol to ensure consistent labels across SAR-RARP50.
  • Figure 3: Segmentation class occurrence per sample in train and test sets. The y-axis corresponds to samples in the training set(left) and test set (right). Annotations show the percentage of samples depicting each class in the train set (blue) and test set (orange).
  • Figure 4: Timeseries action graph for Video 44 which corresponds to an operation performed by an expert surgeon. The upper segment of each box corresponds to method predictions, while R stands for reference.
  • Figure 5: Timeseries action graph for Video 46 which corresponds to an operation performed by a junior registrar. The upper segment of each box corresponds to method predictions, while R stands for reference.
  • ...and 2 more figures