Hand Gesture Recognition from Doppler Radar Signals Using Echo State Networks
Towa Sano, Gouhei Tanaka
TL;DR
This paper addresses the need for efficient hand gesture recognition from Doppler FMCW radar signals by introducing an Echo State Network (ESN) framework with a parallel multi-reservoir architecture. Feature maps in time–space and time–frequency domains (RTM, DTM, and MDM where applicable) are processed by independent reservoirs, whose final states are concatenated for readout. The approach, evaluated on the Soli and Dop-NET datasets, achieves state-of-the-art or near-state-of-the-art accuracy with substantially lower training and inference costs than deep learning baselines, demonstrating strong potential for edge-enabled, real-time HGR. The results indicate that multi-reservoir ESNs can effectively fuse heterogeneous radar feature maps while mitigating interference, making them well-suited for resource-constrained HCI applications and robust deployment across subjects and sessions.
Abstract
Hand gesture recognition (HGR) is a fundamental technology in human computer interaction (HCI).In particular, HGR based on Doppler radar signals is suited for in-vehicle interfaces and robotic systems, necessitating lightweight and computationally efficient recognition techniques. However, conventional deep learning-based methods still suffer from high computational costs. To address this issue, we propose an Echo State Network (ESN) approach for radar-based HGR, using frequency-modulated-continuous-wave (FMCW) radar signals. Raw radar data is first converted into feature maps, such as range-time and Doppler-time maps, which are then fed into one or more recurrent neural network-based reservoirs. The obtained reservoir states are processed by readout classifiers, including ridge regression, support vector machines, and random forests. Comparative experiments demonstrate that our method outperforms existing approaches on an 11-class HGR task using the Soli dataset and surpasses existing deep learning models on a 4-class HGR task using the Dop-NET dataset. The results indicate that parallel processing using multi-reservoir ESNs are effective for recognizing temporal patterns from the multiple different feature maps in the time-space and time-frequency domains. Our ESN approaches achieve high recognition performance with low computational cost in HGR, showing great potential for more advanced HCI technologies, especially in resource-constrained environments.
