SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos
Chinedu Innocent Nwoye, Nicolas Padoy
TL;DR
This work tackles multi-class multi-tool tracking in surgical videos by introducing SurgiTrack, which combines a YOLOv7 detector with direction-based re-identification and a Harmonizing Bipartite Graph Matching framework to fuse identities across three trajectory perspectives: visibility, intracorporeal, and intraoperative. The CholecTrack20 dataset enables this multi-perspective evaluation, providing detailed annotations, operator information proxies, and challenging conditions. The approach demonstrates that tool motion direction, as a proxy for operator cues, yields superior re-identification over appearance-based signals and that HBGM effectively resolves cross-perspective identity conflicts, achieving leading performance on HOTA, MOTA, and IDF1 across intraoperative, intracorporeal, and visibility trajectories. The results show strong real-time capability and robustness to surgical visual challenges (bleeding, smoke, occlusion, reflections) and varying frame rates, with potential to enhance computer-assisted surgery through more reliable tool tracking and trajectory reasoning.
Abstract
Accurate tool tracking is essential for the success of computer-assisted intervention. Previous efforts often modeled tool trajectories rigidly, overlooking the dynamic nature of surgical procedures, especially tracking scenarios like out-of-body and out-of-camera views. Addressing this limitation, the new CholecTrack20 dataset provides detailed labels that account for multiple tool trajectories in three perspectives: (1) intraoperative, (2) intracorporeal, and (3) visibility, representing the different types of temporal duration of tool tracks. These fine-grained labels enhance tracking flexibility but also increase the task complexity. Re-identifying tools after occlusion or re-insertion into the body remains challenging due to high visual similarity, especially among tools of the same category. This work recognizes the critical role of the tool operators in distinguishing tool track instances, especially those belonging to the same tool category. The operators' information are however not explicitly captured in surgical videos. We therefore propose SurgiTrack, a novel deep learning method that leverages YOLOv7 for precise tool detection and employs an attention mechanism to model the originating direction of the tools, as a proxy to their operators, for tool re-identification. To handle diverse tool trajectory perspectives, SurgiTrack employs a harmonizing bipartite matching graph, minimizing conflicts and ensuring accurate tool identity association. Experimental results on CholecTrack20 demonstrate SurgiTrack's effectiveness, outperforming baselines and state-of-the-art methods with real-time inference capability. This work sets a new standard in surgical tool tracking, providing dynamic trajectories for more adaptable and precise assistance in minimally invasive surgeries.
