Table of Contents
Fetching ...

Surgical Phase and Instrument Recognition: How to identify appropriate Dataset Splits

Georgii Kostiuchik, Lalith Sharan, Benedikt Mayer, Ivo Wolf, Bernhard Preim, Sandy Engelhardt

TL;DR

The paper addresses the problem that machine learning evaluation for surgical phase and instrument recognition can be unreliable when dataset splits fail to reflect the true distribution of phases and instrument co-occurrences. It introduces an interactive visualization framework with a Phase view, an Instrument view, and supplementary panels to audit and improve dataset partitions. Through a user study and an analysis of common Cholec80 splits, the authors demonstrate that splits can omit important phase transitions and instrument co-occurrences, and show how the tool enables re-partitioning to improve representation. The framework, publicly available at the provided URL, offers a practical approach to strengthening model evaluation and guiding dataset construction for surgical workflow analysis.

Abstract

Purpose: Machine learning models can only be reliably evaluated if training, validation, and test data splits are representative and not affected by the absence of classes of interest. Surgical workflow and instrument recognition tasks are complicated in this manner, because of heavy data imbalances resulting from different lengths of phases and their erratic occurrences. Furthermore, the issue becomes difficult as sub-properties that help define phases, like instrument (co-)occurrence, are usually not considered when defining the split. We argue that such sub-properties must be equally considered. Methods: This work presents a publicly available data visualization tool that enables interactive exploration of dataset splits for surgical phase and instrument recognition. It focuses on the visualization of the occurrence of phases, phase transitions, instruments, and instrument combinations across sets. Particularly, it facilitates the assessment and identification of sub-optimal dataset splits. Results: We performed an analysis of common Cholec80 dataset splits using the proposed application and were able to uncover phase transitions and combinations of instruments that were not represented in one of the sets. Additionally, we outlined possible improvements to the splits. A user study with ten participants demonstrated the ability of participants to solve a selection of data exploration tasks using the proposed application. Conclusion: In highly unbalanced class distributions, special care should be taken with respect to the selection of an appropriate dataset split. Our interactive data visualization tool presents a promising approach for the assessment of dataset splits for surgical phase and instrument recognition. Evaluation results show that it can enhance the development of machine learning models. The application is available at https://cardio-ai.github.io/endovis-ml/ .

Surgical Phase and Instrument Recognition: How to identify appropriate Dataset Splits

TL;DR

The paper addresses the problem that machine learning evaluation for surgical phase and instrument recognition can be unreliable when dataset splits fail to reflect the true distribution of phases and instrument co-occurrences. It introduces an interactive visualization framework with a Phase view, an Instrument view, and supplementary panels to audit and improve dataset partitions. Through a user study and an analysis of common Cholec80 splits, the authors demonstrate that splits can omit important phase transitions and instrument co-occurrences, and show how the tool enables re-partitioning to improve representation. The framework, publicly available at the provided URL, offers a practical approach to strengthening model evaluation and guiding dataset construction for surgical workflow analysis.

Abstract

Purpose: Machine learning models can only be reliably evaluated if training, validation, and test data splits are representative and not affected by the absence of classes of interest. Surgical workflow and instrument recognition tasks are complicated in this manner, because of heavy data imbalances resulting from different lengths of phases and their erratic occurrences. Furthermore, the issue becomes difficult as sub-properties that help define phases, like instrument (co-)occurrence, are usually not considered when defining the split. We argue that such sub-properties must be equally considered. Methods: This work presents a publicly available data visualization tool that enables interactive exploration of dataset splits for surgical phase and instrument recognition. It focuses on the visualization of the occurrence of phases, phase transitions, instruments, and instrument combinations across sets. Particularly, it facilitates the assessment and identification of sub-optimal dataset splits. Results: We performed an analysis of common Cholec80 dataset splits using the proposed application and were able to uncover phase transitions and combinations of instruments that were not represented in one of the sets. Additionally, we outlined possible improvements to the splits. A user study with ten participants demonstrated the ability of participants to solve a selection of data exploration tasks using the proposed application. Conclusion: In highly unbalanced class distributions, special care should be taken with respect to the selection of an appropriate dataset split. Our interactive data visualization tool presents a promising approach for the assessment of dataset splits for surgical phase and instrument recognition. Evaluation results show that it can enhance the development of machine learning models. The application is available at https://cardio-ai.github.io/endovis-ml/ .
Paper Structure (17 sections, 6 figures, 3 tables)

This paper contains 17 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Phase view of the proposed application with eight proctocolectomy surgeries from the "Surgical Workflow Analysis in the sensorOR 2017" challenge dataset.
  • Figure 2: Instrument view of the proposed application with eight proctocolectomy surgeries from the "Surgical Workflow Analysis in the sensorOR 2017" challenge dataset (A) and selected combination of grasper and ligasure (B).
  • Figure 3: Overall task completion percentage with the corresponding 95%-confidence intervals.
  • Figure 4: Characteristics and shortcomings of the 40/-/40 split. Surgeries starting in the Calot triangle dissection phase are only present in the training set (A). The ending sequence Gallbladder retraction to Cleaning coagulation occurs only in the training set (B). The instruments bipolar and scissors co-occur only in the training set (C).
  • Figure 5: Characteristics and shortcomings of the 32/8/40 split. Surgeries from the validation set have fewer frames on average, compared to the training and test sets (A). The phase transitions (Gallbladder dissection, Cleaning coagulation) and (Cleaning coagulation, Gallbladder packaging) occur only once in the training set (B). The simultaneous occurrence of the instruments grasper, bipolar, and irrigator is not represented in the validation set (C).
  • ...and 1 more figures