Table of Contents
Fetching ...

G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition

Kaikai Deng, Dong Zhao, Wenxin Zheng, Yue Ling, Kangwen Yin, Huadong Ma

TL;DR

G3R addresses the data scarcity barrier in generalized mmWave radar gesture recognition by converting abundant 2D videos into rich, fine-grained radar data. It introduces a modular pipeline with a gesture reflection point generator, a signal simulation model, and an encoder-decoder that align synthetic data with real radar distributions, enabling training with mostly generated data and a small amount of real data. Across diverse postures, positions, and scenes, G3R delivers high recognition accuracy (up to 97.32% with limited real data) and demonstrates strong generalization, including new users and multi-user coexistence scenarios. The work significantly reduces data collection costs while maintaining robust performance, advancing privacy-preserving, contactless gesture sensing in real-world environments.

Abstract

Millimeter wave radar is gaining traction recently as a promising modality for enabling pervasive and privacy-preserving gesture recognition. However, the lack of rich and fine-grained radar datasets hinders progress in developing generalized deep learning models for gesture recognition across various user postures (e.g., standing, sitting), positions, and scenes. To remedy this, we resort to designing a software pipeline that exploits wealthy 2D videos to generate realistic radar data, but it needs to address the challenge of simulating diversified and fine-grained reflection properties of user gestures. To this end, we design G3R with three key components: (i) a gesture reflection point generator expands the arm's skeleton points to form human reflection points; (ii) a signal simulation model simulates the multipath reflection and attenuation of radar signals to output the human intensity map; (iii) an encoder-decoder model combines a sampling module and a fitting module to address the differences in number and distribution of points between generated and real-world radar data for generating realistic radar data. We implement and evaluate G3R using 2D videos from public data sources and self-collected real-world radar data, demonstrating its superiority over other state-of-the-art approaches for gesture recognition.

G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition

TL;DR

G3R addresses the data scarcity barrier in generalized mmWave radar gesture recognition by converting abundant 2D videos into rich, fine-grained radar data. It introduces a modular pipeline with a gesture reflection point generator, a signal simulation model, and an encoder-decoder that align synthetic data with real radar distributions, enabling training with mostly generated data and a small amount of real data. Across diverse postures, positions, and scenes, G3R delivers high recognition accuracy (up to 97.32% with limited real data) and demonstrates strong generalization, including new users and multi-user coexistence scenarios. The work significantly reduces data collection costs while maintaining robust performance, advancing privacy-preserving, contactless gesture sensing in real-world environments.

Abstract

Millimeter wave radar is gaining traction recently as a promising modality for enabling pervasive and privacy-preserving gesture recognition. However, the lack of rich and fine-grained radar datasets hinders progress in developing generalized deep learning models for gesture recognition across various user postures (e.g., standing, sitting), positions, and scenes. To remedy this, we resort to designing a software pipeline that exploits wealthy 2D videos to generate realistic radar data, but it needs to address the challenge of simulating diversified and fine-grained reflection properties of user gestures. To this end, we design G3R with three key components: (i) a gesture reflection point generator expands the arm's skeleton points to form human reflection points; (ii) a signal simulation model simulates the multipath reflection and attenuation of radar signals to output the human intensity map; (iii) an encoder-decoder model combines a sampling module and a fitting module to address the differences in number and distribution of points between generated and real-world radar data for generating realistic radar data. We implement and evaluate G3R using 2D videos from public data sources and self-collected real-world radar data, demonstrating its superiority over other state-of-the-art approaches for gesture recognition.
Paper Structure (33 sections, 8 equations, 30 figures)

This paper contains 33 sections, 8 equations, 30 figures.

Figures (30)

  • Figure 1: Five different gestures. Red arrows represent the gesture's movement directions, while blue dotted and solid lines represent the starting and ending of different gestures, respectively.
  • Figure 2: Examples of real-world radar data for five gestures. Red and blue points represent the data distribution of a user performing the same gesture while standing and sitting, respectively.
  • Figure 3: Recognition accuracy as sitting posture samples increase.
  • Figure 4: Recognition accuracy as the number of positions increases.
  • Figure 5: Recognition accuracy with different scenes.
  • ...and 25 more figures