Table of Contents
Fetching ...

SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos

Chinedu Innocent Nwoye, Nicolas Padoy

TL;DR

This work tackles multi-class multi-tool tracking in surgical videos by introducing SurgiTrack, which combines a YOLOv7 detector with direction-based re-identification and a Harmonizing Bipartite Graph Matching framework to fuse identities across three trajectory perspectives: visibility, intracorporeal, and intraoperative. The CholecTrack20 dataset enables this multi-perspective evaluation, providing detailed annotations, operator information proxies, and challenging conditions. The approach demonstrates that tool motion direction, as a proxy for operator cues, yields superior re-identification over appearance-based signals and that HBGM effectively resolves cross-perspective identity conflicts, achieving leading performance on HOTA, MOTA, and IDF1 across intraoperative, intracorporeal, and visibility trajectories. The results show strong real-time capability and robustness to surgical visual challenges (bleeding, smoke, occlusion, reflections) and varying frame rates, with potential to enhance computer-assisted surgery through more reliable tool tracking and trajectory reasoning.

Abstract

Accurate tool tracking is essential for the success of computer-assisted intervention. Previous efforts often modeled tool trajectories rigidly, overlooking the dynamic nature of surgical procedures, especially tracking scenarios like out-of-body and out-of-camera views. Addressing this limitation, the new CholecTrack20 dataset provides detailed labels that account for multiple tool trajectories in three perspectives: (1) intraoperative, (2) intracorporeal, and (3) visibility, representing the different types of temporal duration of tool tracks. These fine-grained labels enhance tracking flexibility but also increase the task complexity. Re-identifying tools after occlusion or re-insertion into the body remains challenging due to high visual similarity, especially among tools of the same category. This work recognizes the critical role of the tool operators in distinguishing tool track instances, especially those belonging to the same tool category. The operators' information are however not explicitly captured in surgical videos. We therefore propose SurgiTrack, a novel deep learning method that leverages YOLOv7 for precise tool detection and employs an attention mechanism to model the originating direction of the tools, as a proxy to their operators, for tool re-identification. To handle diverse tool trajectory perspectives, SurgiTrack employs a harmonizing bipartite matching graph, minimizing conflicts and ensuring accurate tool identity association. Experimental results on CholecTrack20 demonstrate SurgiTrack's effectiveness, outperforming baselines and state-of-the-art methods with real-time inference capability. This work sets a new standard in surgical tool tracking, providing dynamic trajectories for more adaptable and precise assistance in minimally invasive surgeries.

SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos

TL;DR

This work tackles multi-class multi-tool tracking in surgical videos by introducing SurgiTrack, which combines a YOLOv7 detector with direction-based re-identification and a Harmonizing Bipartite Graph Matching framework to fuse identities across three trajectory perspectives: visibility, intracorporeal, and intraoperative. The CholecTrack20 dataset enables this multi-perspective evaluation, providing detailed annotations, operator information proxies, and challenging conditions. The approach demonstrates that tool motion direction, as a proxy for operator cues, yields superior re-identification over appearance-based signals and that HBGM effectively resolves cross-perspective identity conflicts, achieving leading performance on HOTA, MOTA, and IDF1 across intraoperative, intracorporeal, and visibility trajectories. The results show strong real-time capability and robustness to surgical visual challenges (bleeding, smoke, occlusion, reflections) and varying frame rates, with potential to enhance computer-assisted surgery through more reliable tool tracking and trajectory reasoning.

Abstract

Accurate tool tracking is essential for the success of computer-assisted intervention. Previous efforts often modeled tool trajectories rigidly, overlooking the dynamic nature of surgical procedures, especially tracking scenarios like out-of-body and out-of-camera views. Addressing this limitation, the new CholecTrack20 dataset provides detailed labels that account for multiple tool trajectories in three perspectives: (1) intraoperative, (2) intracorporeal, and (3) visibility, representing the different types of temporal duration of tool tracks. These fine-grained labels enhance tracking flexibility but also increase the task complexity. Re-identifying tools after occlusion or re-insertion into the body remains challenging due to high visual similarity, especially among tools of the same category. This work recognizes the critical role of the tool operators in distinguishing tool track instances, especially those belonging to the same tool category. The operators' information are however not explicitly captured in surgical videos. We therefore propose SurgiTrack, a novel deep learning method that leverages YOLOv7 for precise tool detection and employs an attention mechanism to model the originating direction of the tools, as a proxy to their operators, for tool re-identification. To handle diverse tool trajectory perspectives, SurgiTrack employs a harmonizing bipartite matching graph, minimizing conflicts and ensuring accurate tool identity association. Experimental results on CholecTrack20 demonstrate SurgiTrack's effectiveness, outperforming baselines and state-of-the-art methods with real-time inference capability. This work sets a new standard in surgical tool tracking, providing dynamic trajectories for more adaptable and precise assistance in minimally invasive surgeries.
Paper Structure (31 sections, 11 equations, 7 figures, 7 tables)

This paper contains 31 sections, 11 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Surgical tool tracking demonstrating (top) qualitative fine-grained tracking result across multiple tools, classes, and perspectives and (bottom) superior quantitative results compared to the state-of-the-art.
  • Figure 2: Overview of CholecTrack20 dataset showing localization, tracking, and associated labels Nwoye2023CholecTrack20.
  • Figure 3: Overview of our proposed tool tracking model showing: (a) full architecture of SurgiTrack and its major component modules. One of the which is the YOLO-based detector. The other is the Siamese-based surgical tool direction estimator - full architectural detail in (b) which also shows an optional head for surgeon operator classification. The last component of the SurgiTrack is the harmonizing bipartite graph matching (HBGM) algorithm for tool track identity association under multiple perspectives of tool trajectories: visibility, intracorporeal, and intraoperative - full pipeline in (c).
  • Figure 4: Impact of Direction Estimation in Tracking Surgical Tools at Varying Video Sampling Rates (i.e. 1, 5, 25 frames per seconds FPS). A demonstration is included in the qualitative video.
  • Figure 5: Performance Assessment of SurgiTrack Amidst Surgical Visual Challenges. Overall performance is tabulated at the top, preceded by quantitative and qualitative results showcasing tracking performance on specific visual challenge frames. Values in black denote comparable performance (within the average range, $\pm 1.0$). Values in green indicate above-average performance, while red values indicate decreasing performance below average. The breakdown explores distinct tracking metrics focusing on detection, localization, and association or re-identification. A demo is included in the qualitative results video.
  • ...and 2 more figures