CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools
Chinedu Innocent Nwoye, Kareem Elgohary, Anvita Srinivas, Fauzan Zaid, Joël L. Lavanchy, Nicolas Padoy
TL;DR
CholecTrack20 introduces a multi-perspective surgical tool tracking dataset derived from laparoscopic cholecystectomy, addressing the need for context-rich data beyond conventional single-perspective tracking. It defines intraoperative, intracorporeal, and visibility trajectories, with 20 full-length videos annotated at 1 FPS for 35k frames and 65k tool instances, including rich attributes like tool category, identity, operator, phase, and visual challenges. Benchmarking across state-of-the-art detectors and MOT trackers reveals substantial gaps (HOTA < 45%), underscoring the necessity for context-aware re-identification and trajectory modeling in clinical settings. The dataset, accompanied by thorough QA and analysis, provides a foundation for developing robust AI-driven surgical assistance systems and is released under CC BY-NC-SA to encourage community adoption and advancement.
Abstract
Tool tracking in surgical videos is essential for advancing computer-assisted interventions, such as skill assessment, safety zone estimation, and human-machine collaboration. However, the lack of context-rich datasets limits AI applications in this field. Existing datasets rely on overly generic tracking formalizations that fail to capture surgical-specific dynamics, such as tools moving out of the camera's view or exiting the body. This results in less clinically relevant trajectories and a lack of flexibility for real-world surgical applications. Methods trained on these datasets often struggle with visual challenges such as smoke, reflection, and bleeding, further exposing the limitations of current approaches. We introduce CholecTrack20, a specialized dataset for multi-class, multi-tool tracking in surgical procedures. It redefines tracking formalization with three perspectives: (i) intraoperative, (ii) intracorporeal, and (iii) visibility, enabling adaptable and clinically meaningful tool trajectories. The dataset comprises 20 full-length surgical videos, annotated at 1 fps, yielding over 35K frames and 65K labeled tool instances. Annotations include spatial location, category, identity, operator, phase, and scene visual challenge. Benchmarking state-of-the-art methods on CholecTrack20 reveals significant performance gaps, with current approaches (< 45\% HOTA) failing to meet the accuracy required for clinical translation. These findings motivate the need for advanced and intuitive tracking algorithms and establish CholecTrack20 as a foundation for developing robust AI-driven surgical assistance systems.
