Table of Contents
Fetching ...

ROBUST-MIPS: A Combined Skeletal Pose and Instance Segmentation Dataset for Laparoscopic Surgical Instruments

Zhe Han, Charlie Budd, Gongyu Zhang, Huanyu Tian, Christos Bergeles, Tom Vercauteren

TL;DR

It is argued that skeletal pose annotations are a more efficient annotation approach for surgical tools, striking a balance between richness of semantic information and ease of annotation, thus allowing for accelerated growth of available annotated data.

Abstract

Localisation of surgical tools constitutes a foundational building block for computer-assisted interventional technologies. Works in this field typically focus on training deep learning models to perform segmentation tasks. Performance of learning-based approaches is limited by the availability of diverse annotated data. We argue that skeletal pose annotations are a more efficient annotation approach for surgical tools, striking a balance between richness of semantic information and ease of annotation, thus allowing for accelerated growth of available annotated data. To encourage adoption of this annotation style, we present, ROBUST-MIPS, a combined tool pose and tool instance segmentation dataset derived from the existing ROBUST-MIS dataset. Our enriched dataset facilitates the joint study of these two annotation styles and allow head-to-head comparison on various downstream tasks. To demonstrate the adequacy of pose annotations for surgical tool localisation, we set up a simple benchmark using popular pose estimation methods and observe high-quality results. To ease adoption, together with the dataset, we release our benchmark models and custom tool pose annotation software.

ROBUST-MIPS: A Combined Skeletal Pose and Instance Segmentation Dataset for Laparoscopic Surgical Instruments

TL;DR

It is argued that skeletal pose annotations are a more efficient annotation approach for surgical tools, striking a balance between richness of semantic information and ease of annotation, thus allowing for accelerated growth of available annotated data.

Abstract

Localisation of surgical tools constitutes a foundational building block for computer-assisted interventional technologies. Works in this field typically focus on training deep learning models to perform segmentation tasks. Performance of learning-based approaches is limited by the availability of diverse annotated data. We argue that skeletal pose annotations are a more efficient annotation approach for surgical tools, striking a balance between richness of semantic information and ease of annotation, thus allowing for accelerated growth of available annotated data. To encourage adoption of this annotation style, we present, ROBUST-MIPS, a combined tool pose and tool instance segmentation dataset derived from the existing ROBUST-MIS dataset. Our enriched dataset facilitates the joint study of these two annotation styles and allow head-to-head comparison on various downstream tasks. To demonstrate the adequacy of pose annotations for surgical tool localisation, we set up a simple benchmark using popular pose estimation methods and observe high-quality results. To ease adoption, together with the dataset, we release our benchmark models and custom tool pose annotation software.

Paper Structure

This paper contains 27 sections, 2 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Overview of ROBUST-MIS data, the source data for the proposed ROBUST-MIPS. The directory structure is derived from the original ROBUST-MIS dataset Roß_Reinke_Maier-Hein_Kopp-Schneider_Wagner_Kenngott_Müller-Stich_2019. While the original dataset provides a 10-second video snippet (250 frames) with the last raw frame with its instance segmentation mask, the proposed ROBUST-MIPS dataset extends this structure by incorporating skeletal pose annotations. The final extended directory structure of our contribution is detailed in \ref{['fig:MIPSstructure']}.
  • Figure 2: Examples of selecting keypoints for different types of surgical instruments. (a) Keypoints selected for an articulated surgical instrument. (b) Keypoints selected for a rigid surgical instrument.
  • Figure 3: Examples of selecting valid keypoints in different visibility. (a) Selection of keypoints for partially occluded articulated surgical tools, where one tip point is considered to be in an unpredictable missing state. (b) Selection of keypoints for articulated surgical tool in a closed state, where the two tips are considered to be at the same position. The second tip is labelled as missing. (c) Selection of keypoints for surgical tools with only the shaft visible in the FoV, resulting in missing labels for both tips. (d) Selection of keypoints when the instrument shaft extends beyond the image boundary. To maintain skeletal connectivity, both the HingePoint and EntryPoint are annotated in the padding area with out-of-bounds coordinates, despite being strictly invisible. (e) Selection of keypoints where the HingePoint is masked by the circular FoV but remains within the image frame. The point is geometrically predicted from the tips and arm structure, possessing valid positive coordinates. In this case, the HingePoint and EntryPoint are both labelled as occluded. (f) Selection of keypoints where one of the tips is inferred based on the instrument’s structural characteristics, such predicted keypoints are annotated as occluded.
  • Figure 4: Handling of tool trocar cannulas in ROBUST-MIPS annotations. (a) Pose annotation example in the presence of a tool trocar cannula. The distal end of the cannula is identified as the EntryPoint, while the instrument shaft extends to the HingePoint. (b) The original segmentation annotation in ROBUST-MIS Roß_Reinke_Maier-Hein_Kopp-Schneider_Wagner_Kenngott_Müller-Stich_2019, where the trocar cannula is labelled as a distinct instance. (c) The refined segmentation annotation in ROBUST-MIPS dataset Han_Budd_Zhang_Tian_Bergeles_Vercauteren_2025, where the trocar cannula mask is removed to focus solely on the surgical instrument.
  • Figure 5: (a) Example of JSON annotation file from the custom annotation software. (b) Example of an annotation converted to the Microsoft COCO schema lin2014microsoft which allows for broad compatibility with human pose learning framework.
  • ...and 3 more figures