Table of Contents
Fetching ...

DiG-Net: Enhancing Human-Robot Interaction through Hyper-Range Dynamic Gesture Recognition in Assistive Robotics

Eran Bamani Beeri, Eden Nissinman, Avishai Sintov

Abstract

Dynamic hand gestures play a pivotal role in assistive human-robot interaction (HRI), facilitating intuitive, non-verbal communication, particularly for individuals with mobility constraints or those operating robots remotely. Current gesture recognition methods are mostly limited to short-range interactions, reducing their utility in scenarios demanding robust assistive communication from afar. In this paper, we present DiG-Net, the first dynamic gesture recognition framework enabling robust operation at hyper-range distances of up to 30 meters, specifically designed for assistive robotics to enhance accessibility and improve quality of life. Our proposed Distance-aware Gesture Network (DiG-Net) effectively combines Depth-Conditioned Deformable Alignment (DADA) blocks with Spatio-Temporal Graph modules, enabling robust processing and classification of gesture sequences captured under challenging conditions, including significant physical attenuation, reduced resolution, and dynamic gesture variations commonly experienced in real-world assistive environments. We further introduce the Radiometric Spatio-Temporal Depth Attenuation Loss (RSTDAL), shown to enhance learning and strengthen model robustness across varying distances. Our model demonstrates significant performance improvement over state-of-the-art gesture recognition frameworks, achieving a recognition accuracy of 97.3% on a diverse dataset with challenging hyper-range gestures. By effectively interpreting gestures from considerable distances, DiG-Net significantly enhances the usability of assistive robots in home healthcare, industrial safety, and remote assistance scenarios, enabling seamless and intuitive interactions for users regardless of physical limitations.

DiG-Net: Enhancing Human-Robot Interaction through Hyper-Range Dynamic Gesture Recognition in Assistive Robotics

Abstract

Dynamic hand gestures play a pivotal role in assistive human-robot interaction (HRI), facilitating intuitive, non-verbal communication, particularly for individuals with mobility constraints or those operating robots remotely. Current gesture recognition methods are mostly limited to short-range interactions, reducing their utility in scenarios demanding robust assistive communication from afar. In this paper, we present DiG-Net, the first dynamic gesture recognition framework enabling robust operation at hyper-range distances of up to 30 meters, specifically designed for assistive robotics to enhance accessibility and improve quality of life. Our proposed Distance-aware Gesture Network (DiG-Net) effectively combines Depth-Conditioned Deformable Alignment (DADA) blocks with Spatio-Temporal Graph modules, enabling robust processing and classification of gesture sequences captured under challenging conditions, including significant physical attenuation, reduced resolution, and dynamic gesture variations commonly experienced in real-world assistive environments. We further introduce the Radiometric Spatio-Temporal Depth Attenuation Loss (RSTDAL), shown to enhance learning and strengthen model robustness across varying distances. Our model demonstrates significant performance improvement over state-of-the-art gesture recognition frameworks, achieving a recognition accuracy of 97.3% on a diverse dataset with challenging hyper-range gestures. By effectively interpreting gestures from considerable distances, DiG-Net significantly enhances the usability of assistive robots in home healthcare, industrial safety, and remote assistance scenarios, enabling seamless and intuitive interactions for users regardless of physical limitations.

Paper Structure

This paper contains 16 sections, 10 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Demonstration of a user instructing a robot to go back by sweeping an open palm forward and backward, from a hyper-range distance. In addition to the low-resolution view of the user's hand, the robot may confuse the dynamic gesture with the static stop gesture.
  • Figure 2: Overview of the proposed DiG-Net framework for hyper-range dynamic hand gesture recognition. The model combines Depth-Conditioned Deformable Alignment (DADA), Spatio-Temporal Graph (STG) modules, and Graph Transformer encoders to recognize gestures from RGB videos at distances up to 30 meters.
  • Figure 3: Example frames from the collected gesture dataset showing different users, gestures, and distances in indoor and outdoor environments.
  • Figure 4: The eight dynamic gestures used in the analysis: (a) beckoning, (b) go-back, (c) move-right, (d) move-left, (e) turn-around, (f) follow-me, (g) go-down, and (h) go-up.
  • Figure 5: Gesture recognition success rate of the DiG-Net model as a function of the user’s distance ($\rho$) from the camera. Performance gradually decreases at longer ranges due to lower image resolution and atmospheric effects.
  • ...and 4 more figures