Table of Contents
Fetching ...

2017 Robotic Instrument Segmentation Challenge

Max Allan, Alex Shvets, Thomas Kurmann, Zichen Zhang, Rahul Duggal, Yun-Hsuan Su, Nicola Rieke, Iro Laina, Niveditha Kalavakonda, Sebastian Bodenstedt, Luis Herrera, Wenqi Li, Vladimir Iglovikov, Huoling Luo, Jian Yang, Danail Stoyanov, Lena Maier-Hein, Stefanie Speidel, Mahdi Azizian

TL;DR

The study presents the 2017 Robotic Instrument Segmentation Challenge, delivering a hand-labeled porcine nephrectomy dataset for three segmentation tasks (binary, parts, and type) and evaluating ten teams across mean IoU metrics. A range of architectures—from residual CNNs and cascaded FCNs to TernausNet and ToolNet—are explored, with data augmentation and specialized preprocessing playing key roles. MIT consistently achieved top performance across binary, parts, and many type segmentation datasets, highlighting the value of pre-trained encoders and multi-task training. The work also discusses limitations of a small dataset and labeling challenges, and points to 2018 efforts to provide denser tissue annotations to push toward more comprehensive surgical scene understanding.

Abstract

In mainstream computer vision and machine learning, public datasets such as ImageNet, COCO and KITTI have helped drive enormous improvements by enabling researchers to understand the strengths and limitations of different algorithms via performance comparison. However, this type of approach has had limited translation to problems in robotic assisted surgery as this field has never established the same level of common datasets and benchmarking methods. In 2015 a sub-challenge was introduced at the EndoVis workshop where a set of robotic images were provided with automatically generated annotations from robot forward kinematics. However, there were issues with this dataset due to the limited background variation, lack of complex motion and inaccuracies in the annotation. In this work we present the results of the 2017 challenge on robotic instrument segmentation which involved 10 teams participating in binary, parts and type based segmentation of articulated da Vinci robotic instruments.

2017 Robotic Instrument Segmentation Challenge

TL;DR

The study presents the 2017 Robotic Instrument Segmentation Challenge, delivering a hand-labeled porcine nephrectomy dataset for three segmentation tasks (binary, parts, and type) and evaluating ten teams across mean IoU metrics. A range of architectures—from residual CNNs and cascaded FCNs to TernausNet and ToolNet—are explored, with data augmentation and specialized preprocessing playing key roles. MIT consistently achieved top performance across binary, parts, and many type segmentation datasets, highlighting the value of pre-trained encoders and multi-task training. The work also discusses limitations of a small dataset and labeling challenges, and points to 2018 efforts to provide denser tissue annotations to push toward more comprehensive surgical scene understanding.

Abstract

In mainstream computer vision and machine learning, public datasets such as ImageNet, COCO and KITTI have helped drive enormous improvements by enabling researchers to understand the strengths and limitations of different algorithms via performance comparison. However, this type of approach has had limited translation to problems in robotic assisted surgery as this field has never established the same level of common datasets and benchmarking methods. In 2015 a sub-challenge was introduced at the EndoVis workshop where a set of robotic images were provided with automatically generated annotations from robot forward kinematics. However, there were issues with this dataset due to the limited background variation, lack of complex motion and inaccuracies in the annotation. In this work we present the results of the 2017 challenge on robotic instrument segmentation which involved 10 teams participating in binary, parts and type based segmentation of articulated da Vinci robotic instruments.

Paper Structure

This paper contains 25 sections, 3 equations, 14 figures, 7 tables.

Figures (14)

  • Figure 1: An example of masking an instrument so that the augmented reality overlay does not occlude the surgeon's view.
  • Figure 2: Example frames from the training datasets in order from left to right Dataset 1-8.
  • Figure 3: A ground truth overlay showing example da Vinci Xi instruments. The different parts of the instrument that are annotate in the parts based segmentation challenge are illustrated with green, red and blue colors. An interesting case is the Monopolar Curved Scissors (2nd from left) which has a protective sheath to insulate the electric current used to provide electro-cautery features. We decided in this case to label the entire sheath as shaft as there is no visible wrist for this instrument.
  • Figure 4: The different instrument types used in our type based segmentation challenge. (a) shows the Maryland Bipolar Forceps and (b) shows the Fenestrated Bipolar instruments which we combine into a single label Bipolar Forceps due to similar appearance. (c) shows the Prograsp Forceps instrument. (d) shows the Large Needle Driver instrument. (e) shows the Vessel Sealer, the most visually distinctive instrument in our dataset. (f) shows the Grasping Retractor. (h) shows the Monopolar Curved Scissors and (g) shows a drop-in Ultrasound probe from BK Medical which was present in our dataset but not labelled as an instrument.
  • Figure 5: The network architecture from the team at NCT. The convolutional layer notation is kernel size, output dimensions, stride size and padding. The network has two output layers, one providing part-based segmentation and the other providing type segmentation.
  • ...and 9 more figures