Radar-Based Recognition of Static Hand Gestures in American Sign Language
Christian Schuessler, Wenxuan Zhang, Johanna Bräunig, Marcel Hoffmann, Michael Stelzig, Martin Vossiek
TL;DR
This work tackles privacy-preserving static hand-gesture recognition for VR/HCI by employing a high-density imaging radar (94 TX and 94 RX) to classify American Sign Language letters from image-like data. A radar ray-tracing simulator with a geometrical optics material model (controlled by a diffuse/specular mix parameter $\alpha$) generates diverse synthetic microwave images, which are used to train a ResNet-based classifier. Tested on 104 real measurements, networks trained only on synthetic data achieve robust performance, with deeper models yielding higher F1-scores and some confusion occurring between visually similar signs like 'A' and 'E'. The study demonstrates data-efficient, privacy-friendly radar-based gesture recognition and motivates adopting digital-twin simulation to reduce measurement campaigns and enable cost-effective hardware designs with fewer antenna channels.
Abstract
In the fast-paced field of human-computer interaction (HCI) and virtual reality (VR), automatic gesture recognition has become increasingly essential. This is particularly true for the recognition of hand signs, providing an intuitive way to effortlessly navigate and control VR and HCI applications. Considering increased privacy requirements, radar sensors emerge as a compelling alternative to cameras. They operate effectively in low-light conditions without capturing identifiable human details, thanks to their lower resolution and distinct wavelength compared to visible light. While previous works predominantly deploy radar sensors for dynamic hand gesture recognition based on Doppler information, our approach prioritizes classification using an imaging radar that operates on spatial information, e.g. image-like data. However, generating large training datasets required for neural networks (NN) is a time-consuming and challenging process, often falling short of covering all potential scenarios. Acknowledging these challenges, this study explores the efficacy of synthetic data generated by an advanced radar ray-tracing simulator. This simulator employs an intuitive material model that can be adjusted to introduce data diversity. Despite exclusively training the NN on synthetic data, it demonstrates promising performance when put to the test with real measurement data. This emphasizes the practicality of our methodology in overcoming data scarcity challenges and advancing the field of automatic gesture recognition in VR and HCI applications.
