Radar-Based Recognition of Static Hand Gestures in American Sign Language

Christian Schuessler; Wenxuan Zhang; Johanna Bräunig; Marcel Hoffmann; Michael Stelzig; Martin Vossiek

Radar-Based Recognition of Static Hand Gestures in American Sign Language

Christian Schuessler, Wenxuan Zhang, Johanna Bräunig, Marcel Hoffmann, Michael Stelzig, Martin Vossiek

TL;DR

This work tackles privacy-preserving static hand-gesture recognition for VR/HCI by employing a high-density imaging radar (94 TX and 94 RX) to classify American Sign Language letters from image-like data. A radar ray-tracing simulator with a geometrical optics material model (controlled by a diffuse/specular mix parameter $\alpha$) generates diverse synthetic microwave images, which are used to train a ResNet-based classifier. Tested on 104 real measurements, networks trained only on synthetic data achieve robust performance, with deeper models yielding higher F1-scores and some confusion occurring between visually similar signs like 'A' and 'E'. The study demonstrates data-efficient, privacy-friendly radar-based gesture recognition and motivates adopting digital-twin simulation to reduce measurement campaigns and enable cost-effective hardware designs with fewer antenna channels.

Abstract

In the fast-paced field of human-computer interaction (HCI) and virtual reality (VR), automatic gesture recognition has become increasingly essential. This is particularly true for the recognition of hand signs, providing an intuitive way to effortlessly navigate and control VR and HCI applications. Considering increased privacy requirements, radar sensors emerge as a compelling alternative to cameras. They operate effectively in low-light conditions without capturing identifiable human details, thanks to their lower resolution and distinct wavelength compared to visible light. While previous works predominantly deploy radar sensors for dynamic hand gesture recognition based on Doppler information, our approach prioritizes classification using an imaging radar that operates on spatial information, e.g. image-like data. However, generating large training datasets required for neural networks (NN) is a time-consuming and challenging process, often falling short of covering all potential scenarios. Acknowledging these challenges, this study explores the efficacy of synthetic data generated by an advanced radar ray-tracing simulator. This simulator employs an intuitive material model that can be adjusted to introduce data diversity. Despite exclusively training the NN on synthetic data, it demonstrates promising performance when put to the test with real measurement data. This emphasizes the practicality of our methodology in overcoming data scarcity challenges and advancing the field of automatic gesture recognition in VR and HCI applications.

Radar-Based Recognition of Static Hand Gestures in American Sign Language

TL;DR

) generates diverse synthetic microwave images, which are used to train a ResNet-based classifier. Tested on 104 real measurements, networks trained only on synthetic data achieve robust performance, with deeper models yielding higher F1-scores and some confusion occurring between visually similar signs like 'A' and 'E'. The study demonstrates data-efficient, privacy-friendly radar-based gesture recognition and motivates adopting digital-twin simulation to reduce measurement campaigns and enable cost-effective hardware designs with fewer antenna channels.

Abstract

Paper Structure (7 sections, 5 equations, 6 figures)

This paper contains 7 sections, 5 equations, 6 figures.

Introduction
Simulation Approach
Measurement Hardware and Signal Model
3D Hand Model
Machine Learning Approach
Results
Discussion and Future Work

Figures (6)

Figure 1: Example showing the hand sign 'A' from ASL. In the first row (a), the intensity and depth image of the simulation are shown. In the second row (b), the same images are shown for a real measurement.
Figure 2: This figure shows the complete simulation process. In the first step, the scene is sampled by ray-tracing to collect all possible signal paths from all TX antennas to all RX antennas. In the second step, the final signal is generated.
Figure 3: Photo of the measurement setup. The hand pose is sensed with the imaging radar in the back.
Figure 4: Complete workflow of the NN. Both intensity-image and depth-image are concatenated and fed into the ResNet backbone. A common classification head follows afterwards.
Figure 5: First 8 characters of the American Sign Language alphabet as 3D model.
...and 1 more figures

Radar-Based Recognition of Static Hand Gestures in American Sign Language

TL;DR

Abstract

Radar-Based Recognition of Static Hand Gestures in American Sign Language

Authors

TL;DR

Abstract

Table of Contents

Figures (6)