EgoSurgery-HTS: A Dataset for Egocentric Hand-Tool Segmentation in Open Surgery Videos
Nathan Darjana, Ryo Fujii, Hideo Saito, Hiroki Kajita
TL;DR
EgoSurgery-HTS addresses the need for pixel-level understanding in egocentric open-surgery videos by introducing a comprehensive dataset annotated for tool instance segmentation (14 tools), hand instance segmentation (4 hands), and hand–tool interactions. Built on the EgoSurgery platform, the dataset uses SAM-based annotation to generate high-quality segmentation masks from bounding boxes, supplemented by manual corrections, and provides extensive statistics on tool co-occurrence and hand–tool associations. The authors benchmark four mainstream detectors (Mask R-CNN, QueryInst, Mask2Former, SOLOv2) across three tasks, demonstrating that specialized architectures like Mask2Former and QueryInst achieve strong performance, particularly in hand and hand–tool segmentation, and that training on EgoSurgery-HTS yields domain-transfer benefits over existing datasets like EgoHands and VISOR-HOS. The work establishes a new standard and benchmark for open-surgery scene understanding, enabling more accurate action recognition, workflow analysis, and potential real-time AI-assisted interventions, while acknowledging current limitations such as data imbalance and the need for more diverse tools and environments.
Abstract
Egocentric open-surgery videos capture rich, fine-grained details essential for accurately modeling surgical procedures and human behavior in the operating room. A detailed, pixel-level understanding of hands and surgical tools is crucial for interpreting a surgeon's actions and intentions. We introduce EgoSurgery-HTS, a new dataset with pixel-wise annotations and a benchmark suite for segmenting surgical tools, hands, and interacting tools in egocentric open-surgery videos. Specifically, we provide a labeled dataset for (1) tool instance segmentation of 14 distinct surgical tools, (2) hand instance segmentation, and (3) hand-tool segmentation to label hands and the tools they manipulate. Using EgoSurgery-HTS, we conduct extensive evaluations of state-of-the-art segmentation methods and demonstrate significant improvements in the accuracy of hand and hand-tool segmentation in egocentric open-surgery videos compared to existing datasets. The dataset will be released at https://github.com/Fujiry0/EgoSurgery.
