Table of Contents
Fetching ...

RoHan: Robust Hand Detection in Operation Room

Roi Papo, Sapir Gershov, Tom Friedman, Itay Or, Gil Bolotin, Shlomi Laufer

TL;DR

RoHan tackles robust gloved-hand detection in operating rooms by addressing domain transfer challenges with two innovations: Artificial Gloves augmentation to bridge visual gaps between public hand datasets and surgical imagery, and a semi-supervised domain adaptation pipeline that iteratively refines pseudo-labels using temporal cues. By pre-training on a diverse set of public datasets augmented with gloves and then applying iterative domain adaptation on ERS and SVGH benchmarks, RoHan achieves state-of-the-art precision and mAP50 while reducing labeling needs. The approach demonstrates strong cross-domain generalization within medical settings and provides new benchmark data to spur further research in surgical hand detection. Overall, RoHan offers a practical, scalable path toward deployable hand-detection systems in clinical environments.

Abstract

Hand-specific localization has garnered significant interest within the computer vision community. Although there are numerous datasets with hand annotations from various angles and settings, domain transfer techniques frequently struggle in surgical environments. This is mainly due to the limited availability of gloved hand instances and the unique challenges of operating rooms (ORs). Thus, hand-detection models tailored to OR settings require extensive training and expensive annotation processes. To overcome these challenges, we present "RoHan" - a novel approach for robust hand detection in the OR, leveraging advanced semi-supervised domain adaptation techniques to tackle the challenges of varying recording conditions, diverse glove colors, and occlusions common in surgical settings. Our methodology encompasses two main stages: (1) data augmentation strategy that utilizes "Artificial Gloves," a method for augmenting publicly available hand datasets with synthetic images of hands-wearing gloves; (2) semi-supervised domain adaptation pipeline that improves detection performance in real-world OR settings through iterative prediction refinement and efficient frame filtering. We evaluate our method using two datasets: simulated enterotomy repair and saphenous vein graft harvesting. "RoHan" substantially reduces the need for extensive labeling and model training, paving the way for the practical implementation of hand detection technologies in medical settings.

RoHan: Robust Hand Detection in Operation Room

TL;DR

RoHan tackles robust gloved-hand detection in operating rooms by addressing domain transfer challenges with two innovations: Artificial Gloves augmentation to bridge visual gaps between public hand datasets and surgical imagery, and a semi-supervised domain adaptation pipeline that iteratively refines pseudo-labels using temporal cues. By pre-training on a diverse set of public datasets augmented with gloves and then applying iterative domain adaptation on ERS and SVGH benchmarks, RoHan achieves state-of-the-art precision and mAP50 while reducing labeling needs. The approach demonstrates strong cross-domain generalization within medical settings and provides new benchmark data to spur further research in surgical hand detection. Overall, RoHan offers a practical, scalable path toward deployable hand-detection systems in clinical environments.

Abstract

Hand-specific localization has garnered significant interest within the computer vision community. Although there are numerous datasets with hand annotations from various angles and settings, domain transfer techniques frequently struggle in surgical environments. This is mainly due to the limited availability of gloved hand instances and the unique challenges of operating rooms (ORs). Thus, hand-detection models tailored to OR settings require extensive training and expensive annotation processes. To overcome these challenges, we present "RoHan" - a novel approach for robust hand detection in the OR, leveraging advanced semi-supervised domain adaptation techniques to tackle the challenges of varying recording conditions, diverse glove colors, and occlusions common in surgical settings. Our methodology encompasses two main stages: (1) data augmentation strategy that utilizes "Artificial Gloves," a method for augmenting publicly available hand datasets with synthetic images of hands-wearing gloves; (2) semi-supervised domain adaptation pipeline that improves detection performance in real-world OR settings through iterative prediction refinement and efficient frame filtering. We evaluate our method using two datasets: simulated enterotomy repair and saphenous vein graft harvesting. "RoHan" substantially reduces the need for extensive labeling and model training, paving the way for the practical implementation of hand detection technologies in medical settings.
Paper Structure (15 sections, 7 figures, 4 tables)

This paper contains 15 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Robust Hand Detection We propose the use of Semi-Supervised Domain Adaptation and Self-Training Techniques for hand detection. These techniques provide predictions that demonstrate robustness across various shooting angles and lighting conditions. Specifically, they are effective in scenarios including: a) top views, b) views from a surgeon's head-mounted camera, c) zoomed-in views of the surgeon's working area, and d) side views of the patient's bed. Faces were manually blured
  • Figure 2: Frames from the ERS dataset. (A) Top view; (B) Surface view.
  • Figure 3: In the images above, Artificial Gloves are applied to public datasets using the previously mentioned Segmentation mask gloves method, featuring various glove colors and blood patterns. In the image below, 3D Hands Recovery gloves are utilized to simulate gloves.
  • Figure 4: Our Semi-Supervised Domain Adaptation Self-Training Pipeline
  • Figure 5: Spacial Filtering We filter out bounding boxes whose centers are out of the area of interest
  • ...and 2 more figures