RoHan: Robust Hand Detection in Operation Room
Roi Papo, Sapir Gershov, Tom Friedman, Itay Or, Gil Bolotin, Shlomi Laufer
TL;DR
RoHan tackles robust gloved-hand detection in operating rooms by addressing domain transfer challenges with two innovations: Artificial Gloves augmentation to bridge visual gaps between public hand datasets and surgical imagery, and a semi-supervised domain adaptation pipeline that iteratively refines pseudo-labels using temporal cues. By pre-training on a diverse set of public datasets augmented with gloves and then applying iterative domain adaptation on ERS and SVGH benchmarks, RoHan achieves state-of-the-art precision and mAP50 while reducing labeling needs. The approach demonstrates strong cross-domain generalization within medical settings and provides new benchmark data to spur further research in surgical hand detection. Overall, RoHan offers a practical, scalable path toward deployable hand-detection systems in clinical environments.
Abstract
Hand-specific localization has garnered significant interest within the computer vision community. Although there are numerous datasets with hand annotations from various angles and settings, domain transfer techniques frequently struggle in surgical environments. This is mainly due to the limited availability of gloved hand instances and the unique challenges of operating rooms (ORs). Thus, hand-detection models tailored to OR settings require extensive training and expensive annotation processes. To overcome these challenges, we present "RoHan" - a novel approach for robust hand detection in the OR, leveraging advanced semi-supervised domain adaptation techniques to tackle the challenges of varying recording conditions, diverse glove colors, and occlusions common in surgical settings. Our methodology encompasses two main stages: (1) data augmentation strategy that utilizes "Artificial Gloves," a method for augmenting publicly available hand datasets with synthetic images of hands-wearing gloves; (2) semi-supervised domain adaptation pipeline that improves detection performance in real-world OR settings through iterative prediction refinement and efficient frame filtering. We evaluate our method using two datasets: simulated enterotomy repair and saphenous vein graft harvesting. "RoHan" substantially reduces the need for extensive labeling and model training, paving the way for the practical implementation of hand detection technologies in medical settings.
