Table of Contents
Fetching ...

Gesture Recognition for FMCW Radar on the Edge

Maximilian Strobel, Stephan Schoenfeldt, Jonas Daugalas

TL;DR

Gesture Recognition for FMCW Radar on the Edge tackles touchless human-computer interaction using a 60 GHz FMCW radar. It combines an edge-optimized radar processing pipeline with a compact five-feature representation and a GRU-based RNN to detect and classify five gestures, avoiding heavy 2D processing. The work introduces a lightweight target-detection and feature-extraction chain, a tiny neural network with label refinement and data augmentation, and demonstrates practical edge deployment on an ARM Cortex-M4 with modest memory and power budgets. On a held-out test set, it achieves a high F1 score of 98.4% while running on resource-constrained hardware (RAM ~120 kB, flash ~278 kB, power ~75 mW).

Abstract

This paper introduces a lightweight gesture recognition system based on 60 GHz frequency modulated continuous wave (FMCW) radar. We show that gestures can be characterized efficiently by a set of five features, and propose a slim radar processing algorithm to extract these features. In contrast to previous approaches, we avoid heavy 2D processing, i.e. range-Doppler imaging, and perform instead an early target detection - this allows us to port the system to fully embedded platforms with tight constraints on memory, compute and power consumption. A recurrent neural network (RNN) based architecture exploits these features to jointly detect and classify five different gestures. The proposed system recognizes gestures with an F1 score of 98.4% on our hold-out test dataset, it runs on an Arm Cortex-M4 microcontroller requiring less than 280 kB of flash memory, 120 kB of RAM, and consuming 75 mW of power.

Gesture Recognition for FMCW Radar on the Edge

TL;DR

Gesture Recognition for FMCW Radar on the Edge tackles touchless human-computer interaction using a 60 GHz FMCW radar. It combines an edge-optimized radar processing pipeline with a compact five-feature representation and a GRU-based RNN to detect and classify five gestures, avoiding heavy 2D processing. The work introduces a lightweight target-detection and feature-extraction chain, a tiny neural network with label refinement and data augmentation, and demonstrates practical edge deployment on an ARM Cortex-M4 with modest memory and power budgets. On a held-out test set, it achieves a high F1 score of 98.4% while running on resource-constrained hardware (RAM ~120 kB, flash ~278 kB, power ~75 mW).

Abstract

This paper introduces a lightweight gesture recognition system based on 60 GHz frequency modulated continuous wave (FMCW) radar. We show that gestures can be characterized efficiently by a set of five features, and propose a slim radar processing algorithm to extract these features. In contrast to previous approaches, we avoid heavy 2D processing, i.e. range-Doppler imaging, and perform instead an early target detection - this allows us to port the system to fully embedded platforms with tight constraints on memory, compute and power consumption. A recurrent neural network (RNN) based architecture exploits these features to jointly detect and classify five different gestures. The proposed system recognizes gestures with an F1 score of 98.4% on our hold-out test dataset, it runs on an Arm Cortex-M4 microcontroller requiring less than 280 kB of flash memory, 120 kB of RAM, and consuming 75 mW of power.
Paper Structure (13 sections, 5 figures)

This paper contains 13 sections, 5 figures.

Figures (5)

  • Figure 1: The set of gestures used in this work.
  • Figure 2: The range profile shows two targets above the threshold, the moving hand and the body of the person. Instead of selecting the target with the highest signal strength, the body, we select the closest target, the hand. This results in an efficient but stable target detection, which is immune to random body movements of the person performing the gesture.
  • Figure 3: The matrix shows prototypical characteristics for the five gestures. The bold lines were generated by averaging 3200 samples per gesture; the shaded area around indicates a single standard deviation. The gestures differ in various characteristics, e.g. SwipeLeft and SwipeRight have contrarian series in the horizontal angle.
  • Figure 4: \ref{['fig:sample']} shows the label refinement, where the closest distance above a specific amplitude threshold is shown. The emphasized area indicates the position of the gesture label; all other frames are labeled as background. The gesture samples from \ref{['fig:sample']} are used to compose an artificial gesture sequence, which is shown in \ref{['fig:sequence']}.
  • Figure 5: The confusion matrix shows the averaged results of ten randomly initialized training runs. Gestures mispredicted as background are false negatives, whereas background predicted as gesture is a false positive.