Depth Over RGB: Automatic Evaluation of Open Surgery Skills Using Depth Camera
Ido Zuckerman, Nicole Werner, Jonathan Kouchly, Emma Huston, Shannon DiMarco, Paul DiMusto, Shlomi Laufer
TL;DR
The paper investigates automatic evaluation of open surgery skills using depth cameras as a privacy-preserving, lighting-robust alternative to RGB. It collects a depth-RGB dataset using Azure Kinect across two suturing simulators and evaluates hand/tool detection with YOLOv8 and gesture segmentation with UVAST and MSTCN++. Key findings show depth cameras achieve comparable object-detection and action-segmentation performance to RGB, while enabling accurate 3D hand-path length metrics that reveal expert–novice differences and reduce camera-angle sensitivity. The study discusses limitations such as dataset size and memory constraints impacting Viterbi-based inference, but argues depth cameras offer practical advantages for training, privacy, and robust skill assessment. Overall, depth-based evaluation provides a viable, privacy-conscious approach to objective assessment of open-surgery skills with potential for wider adoption in training and evaluation workflows.
Abstract
Purpose: In this paper, we present a novel approach to the automatic evaluation of open surgery skills using depth cameras. This work is intended to show that depth cameras achieve similar results to RGB cameras, which is the common method in the automatic evaluation of open surgery skills. Moreover, depth cameras offer advantages such as robustness to lighting variations, camera positioning, simplified data compression, and enhanced privacy, making them a promising alternative to RGB cameras. Methods: Experts and novice surgeons completed two simulators of open suturing. We focused on hand and tool detection, and action segmentation in suturing procedures. YOLOv8 was used for tool detection in RGB and depth videos. Furthermore, UVAST and MSTCN++ were used for action segmentation. Our study includes the collection and annotation of a dataset recorded with Azure Kinect. Results: We demonstrated that using depth cameras in object detection and action segmentation achieves comparable results to RGB cameras. Furthermore, we analyzed 3D hand path length, revealing significant differences between experts and novice surgeons, emphasizing the potential of depth cameras in capturing surgical skills. We also investigated the influence of camera angles on measurement accuracy, highlighting the advantages of 3D cameras in providing a more accurate representation of hand movements. Conclusion: Our research contributes to advancing the field of surgical skill assessment by leveraging depth cameras for more reliable and privacy evaluations. The findings suggest that depth cameras can be valuable in assessing surgical skills and provide a foundation for future research in this area.
