G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition
Kaikai Deng, Dong Zhao, Wenxin Zheng, Yue Ling, Kangwen Yin, Huadong Ma
TL;DR
G3R addresses the data scarcity barrier in generalized mmWave radar gesture recognition by converting abundant 2D videos into rich, fine-grained radar data. It introduces a modular pipeline with a gesture reflection point generator, a signal simulation model, and an encoder-decoder that align synthetic data with real radar distributions, enabling training with mostly generated data and a small amount of real data. Across diverse postures, positions, and scenes, G3R delivers high recognition accuracy (up to 97.32% with limited real data) and demonstrates strong generalization, including new users and multi-user coexistence scenarios. The work significantly reduces data collection costs while maintaining robust performance, advancing privacy-preserving, contactless gesture sensing in real-world environments.
Abstract
Millimeter wave radar is gaining traction recently as a promising modality for enabling pervasive and privacy-preserving gesture recognition. However, the lack of rich and fine-grained radar datasets hinders progress in developing generalized deep learning models for gesture recognition across various user postures (e.g., standing, sitting), positions, and scenes. To remedy this, we resort to designing a software pipeline that exploits wealthy 2D videos to generate realistic radar data, but it needs to address the challenge of simulating diversified and fine-grained reflection properties of user gestures. To this end, we design G3R with three key components: (i) a gesture reflection point generator expands the arm's skeleton points to form human reflection points; (ii) a signal simulation model simulates the multipath reflection and attenuation of radar signals to output the human intensity map; (iii) an encoder-decoder model combines a sampling module and a fitting module to address the differences in number and distribution of points between generated and real-world radar data for generating realistic radar data. We implement and evaluate G3R using 2D videos from public data sources and self-collected real-world radar data, demonstrating its superiority over other state-of-the-art approaches for gesture recognition.
