Table of Contents
Fetching ...

CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools

Chinedu Innocent Nwoye, Kareem Elgohary, Anvita Srinivas, Fauzan Zaid, Joël L. Lavanchy, Nicolas Padoy

TL;DR

CholecTrack20 introduces a multi-perspective surgical tool tracking dataset derived from laparoscopic cholecystectomy, addressing the need for context-rich data beyond conventional single-perspective tracking. It defines intraoperative, intracorporeal, and visibility trajectories, with 20 full-length videos annotated at 1 FPS for 35k frames and 65k tool instances, including rich attributes like tool category, identity, operator, phase, and visual challenges. Benchmarking across state-of-the-art detectors and MOT trackers reveals substantial gaps (HOTA < 45%), underscoring the necessity for context-aware re-identification and trajectory modeling in clinical settings. The dataset, accompanied by thorough QA and analysis, provides a foundation for developing robust AI-driven surgical assistance systems and is released under CC BY-NC-SA to encourage community adoption and advancement.

Abstract

Tool tracking in surgical videos is essential for advancing computer-assisted interventions, such as skill assessment, safety zone estimation, and human-machine collaboration. However, the lack of context-rich datasets limits AI applications in this field. Existing datasets rely on overly generic tracking formalizations that fail to capture surgical-specific dynamics, such as tools moving out of the camera's view or exiting the body. This results in less clinically relevant trajectories and a lack of flexibility for real-world surgical applications. Methods trained on these datasets often struggle with visual challenges such as smoke, reflection, and bleeding, further exposing the limitations of current approaches. We introduce CholecTrack20, a specialized dataset for multi-class, multi-tool tracking in surgical procedures. It redefines tracking formalization with three perspectives: (i) intraoperative, (ii) intracorporeal, and (iii) visibility, enabling adaptable and clinically meaningful tool trajectories. The dataset comprises 20 full-length surgical videos, annotated at 1 fps, yielding over 35K frames and 65K labeled tool instances. Annotations include spatial location, category, identity, operator, phase, and scene visual challenge. Benchmarking state-of-the-art methods on CholecTrack20 reveals significant performance gaps, with current approaches (< 45\% HOTA) failing to meet the accuracy required for clinical translation. These findings motivate the need for advanced and intuitive tracking algorithms and establish CholecTrack20 as a foundation for developing robust AI-driven surgical assistance systems.

CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools

TL;DR

CholecTrack20 introduces a multi-perspective surgical tool tracking dataset derived from laparoscopic cholecystectomy, addressing the need for context-rich data beyond conventional single-perspective tracking. It defines intraoperative, intracorporeal, and visibility trajectories, with 20 full-length videos annotated at 1 FPS for 35k frames and 65k tool instances, including rich attributes like tool category, identity, operator, phase, and visual challenges. Benchmarking across state-of-the-art detectors and MOT trackers reveals substantial gaps (HOTA < 45%), underscoring the necessity for context-aware re-identification and trajectory modeling in clinical settings. The dataset, accompanied by thorough QA and analysis, provides a foundation for developing robust AI-driven surgical assistance systems and is released under CC BY-NC-SA to encourage community adoption and advancement.

Abstract

Tool tracking in surgical videos is essential for advancing computer-assisted interventions, such as skill assessment, safety zone estimation, and human-machine collaboration. However, the lack of context-rich datasets limits AI applications in this field. Existing datasets rely on overly generic tracking formalizations that fail to capture surgical-specific dynamics, such as tools moving out of the camera's view or exiting the body. This results in less clinically relevant trajectories and a lack of flexibility for real-world surgical applications. Methods trained on these datasets often struggle with visual challenges such as smoke, reflection, and bleeding, further exposing the limitations of current approaches. We introduce CholecTrack20, a specialized dataset for multi-class, multi-tool tracking in surgical procedures. It redefines tracking formalization with three perspectives: (i) intraoperative, (ii) intracorporeal, and (iii) visibility, enabling adaptable and clinically meaningful tool trajectories. The dataset comprises 20 full-length surgical videos, annotated at 1 fps, yielding over 35K frames and 65K labeled tool instances. Annotations include spatial location, category, identity, operator, phase, and scene visual challenge. Benchmarking state-of-the-art methods on CholecTrack20 reveals significant performance gaps, with current approaches (< 45\% HOTA) failing to meet the accuracy required for clinical translation. These findings motivate the need for advanced and intuitive tracking algorithms and establish CholecTrack20 as a foundation for developing robust AI-driven surgical assistance systems.
Paper Structure (15 sections, 10 figures, 3 tables)

This paper contains 15 sections, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Illustration of multi-perspective tracking in surgical domain and CholecTrack20 dataset labels for surgical tool tracking.
  • Figure 2: Multi-perspective trajectories of surgical tool.
  • Figure 3: Examples of images from CholecTrack20 tracking dataset with the labels overlaid on the raw images.
  • Figure 4: Dataset statistics on the distributions of (a) surgical scene visual challenges across data splits (b) track labels across perspectives, averaged across videos. Track length in seconds.
  • Figure 5: 3D visualization of label alignments showing the tool position over track time. The coloring is for grouping features according to: (a) tool classes and (b) tool operators.
  • ...and 5 more figures