Table of Contents
Fetching ...

Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge

Tobias Rueckert, David Rauber, Raphaela Maerkl, Leonard Klausmann, Suemeyye R. Yildiran, Max Gutbrod, Danilo Weber Nunes, Alvaro Fernandez Moreno, Imanol Luengo, Danail Stoyanov, Nicolas Toussaint, Enki Cho, Hyeon Bae Kim, Oh Sung Choo, Ka Young Kim, Seong Tae Kim, Gonçalo Arantes, Kehan Song, Jianjun Zhu, Junchen Xiong, Tingyi Lin, Shunsuke Kikuchi, Hiroki Matsuzaki, Atsushi Kouno, João Renato Ribeiro Manesco, João Paulo Papa, Tae-Min Choi, Tae Kyeong Jeong, Juyoun Park, Oluwatosin Alabi, Meng Wei, Tom Vercauteren, Runzhi Wu, Mengya Xu, An Wang, Long Bai, Hongliang Ren, Amine Yamlahi, Jakob Hennighausen, Lena Maier-Hein, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Shu Yang, Yihui Wang, Hao Chen, Santiago Rodríguez, Nicolás Aparicio, Leonardo Manrique, Juan Camilo Lyons, Olivia Hosie, Nicolás Ayobi, Pablo Arbeláez, Yiping Li, Yasmina Al Khalil, Sahar Nasirihaghighi, Stefanie Speidel, Daniel Rueckert, Hubertus Feussner, Dirk Wilhelm, Christoph Palm

TL;DR

The paper presents PhaKIR, a MICCAI 2024 EndoVis sub-challenge that jointly benchmarks surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation on a novel multi-center dataset of 13 real-world laparoscopic cholecystectomies. It emphasizes temporal context and context-aware perception, provides a rigorous BIAs-compliant evaluation with bootstrapped metrics, and reveals that transformer-based methods pursue top performance in phase recognition and segmentation while keypoint estimation remains challenging with limited participation. The study highlights generalization across hospitals as a persistent limitation and calls for more diverse data, robust temporal modeling across tasks, and improved cross-center transfer capabilities. Overall, PhaKIR contributes a high-quality resource and benchmark that will drive development of temporally aware, transferable RAMIS scene understanding methods.

Abstract

Reliable recognition and localization of surgical instruments in endoscopic video recordings are foundational for a wide range of applications in computer- and robot-assisted minimally invasive surgery (RAMIS), including surgical training, skill assessment, and autonomous assistance. However, robust performance under real-world conditions remains a significant challenge. Incorporating surgical context - such as the current procedural phase - has emerged as a promising strategy to improve robustness and interpretability. To address these challenges, we organized the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) sub-challenge as part of the Endoscopic Vision (EndoVis) challenge at MICCAI 2024. We introduced a novel, multi-center dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three distinct medical institutions, with unified annotations for three interrelated tasks: surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation. Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data while supporting the integration of temporal information across entire procedures. We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges. The PhaKIR sub-challenge advances the field by providing a unique benchmark for developing temporally aware, context-driven methods in RAMIS and offers a high-quality resource to support future research in surgical scene understanding.

Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge

TL;DR

The paper presents PhaKIR, a MICCAI 2024 EndoVis sub-challenge that jointly benchmarks surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation on a novel multi-center dataset of 13 real-world laparoscopic cholecystectomies. It emphasizes temporal context and context-aware perception, provides a rigorous BIAs-compliant evaluation with bootstrapped metrics, and reveals that transformer-based methods pursue top performance in phase recognition and segmentation while keypoint estimation remains challenging with limited participation. The study highlights generalization across hospitals as a persistent limitation and calls for more diverse data, robust temporal modeling across tasks, and improved cross-center transfer capabilities. Overall, PhaKIR contributes a high-quality resource and benchmark that will drive development of temporally aware, transferable RAMIS scene understanding methods.

Abstract

Reliable recognition and localization of surgical instruments in endoscopic video recordings are foundational for a wide range of applications in computer- and robot-assisted minimally invasive surgery (RAMIS), including surgical training, skill assessment, and autonomous assistance. However, robust performance under real-world conditions remains a significant challenge. Incorporating surgical context - such as the current procedural phase - has emerged as a promising strategy to improve robustness and interpretability. To address these challenges, we organized the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) sub-challenge as part of the Endoscopic Vision (EndoVis) challenge at MICCAI 2024. We introduced a novel, multi-center dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three distinct medical institutions, with unified annotations for three interrelated tasks: surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation. Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data while supporting the integration of temporal information across entire procedures. We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges. The PhaKIR sub-challenge advances the field by providing a unique benchmark for developing temporally aware, context-driven methods in RAMIS and offers a high-quality resource to support future research in surgical scene understanding.

Paper Structure

This paper contains 58 sections, 10 equations, 18 figures, 14 tables.

Figures (18)

  • Figure 1: Visualization of the PhaKIR tasks and annotations for each of the three medical centers. For the phase recognition task, the phases preparation (P), calot triangle dissection (CTD), clipping and cutting (ClCu), gallbladder dissection (GD), gallbladder packaging (GP), cleaning and coagulation (ClCo), and gallbladder retraction (GR) are shown. For the instrument instance segmentation task, the color-encoded masks are presented. For the instrument keypoint estimation task, a visualization of the keypoint coordinates is depicted, including hidden keypoints surrounded by white.
  • Figure 2: Number of participants that registered and submitted for each of the eight individual EndoVis-2024 sub-challenges, sorted in ascending order based on the number of registrations.
  • Figure 3: Timeline of our PhaKIR sub-challenge highlighting key milestones and dates. All dates refer to the year 2024.
  • Figure 4: Number of occurrences for each instrument type, specified for training and test data and sorted in descending order according to the number of occurrences in the training data.
  • Figure 5: Visualization of ranking stability based on bootstrapping for the surgical phase recognition task for the metrics F1-score (\ref{['fig:results:phase_recognition:ranking_robustness:blobs:f1']}), BA (\ref{['fig:results:phase_recognition:ranking_robustness:blobs:f1']}), and global (\ref{['fig:results:phase_recognition:ranking_robustness:blobs:global']}). For each metric, the individual ranking stability is shown and the global rank is indicated after each team name.
  • ...and 13 more figures