Time Frequency Analysis of EMG Signal for Gesture Recognition using Fine grained Features
Parshuram N. Aarotale, Ajita Rattani
TL;DR
This work tackles EMG-based gesture recognition by addressing fine-grained temporal-frequency cues that conventional CNNs often miss. It introduces XMANet, a cross-layer mutual attention network that treats each CNN layer as an expert and exchanges attention across shallow-to-deep layers, augmented by attention-based crops and a mutual learning schedule. Representations from STFT spectrograms and wavelet-based scalograms are used to create rich time-frequency inputs, and experiments on Grabmyo and FORS-EMG show consistent accuracy gains over strong CNN baselines across multiple backbones. The approach improves robustness and accuracy in EMG gesture classification, suggesting strong potential for prosthetic control and human–machine interfaces, with future work towards fairness and self-supervised signal representations.
Abstract
Electromyography (EMG) based hand gesture recognition converts forearm muscle activity into control commands for prosthetics, rehabilitation, and human computer interaction. This paper proposes a novel approach to EMG-based hand gesture recognition that uses fine-grained classification and presents XMANet, which unifies low-level local and high level semantic cues through cross layer mutual attention among shallow to deep CNN experts. Using stacked spectrograms and scalograms derived from the Short Time Fourier Transform (STFT) and Wavelet Transform (WT), we benchmark XMANet against ResNet50, DenseNet-121, MobileNetV3, and EfficientNetB0. Experimental results on the Grabmyo dataset indicate that, using STFT, the proposed XMANet model outperforms the baseline ResNet50, EfficientNetB0, MobileNetV3, and DenseNet121 models with improvement of approximately 1.72%, 4.38%, 5.10%, and 2.53%, respectively. When employing the WT approach, improvements of around 1.57%, 1.88%, 1.46%, and 2.05% are observed over the same baselines. Similarly, on the FORS EMG dataset, the XMANet(ResNet50) model using STFT shows an improvement of about 5.04% over the baseline ResNet50. In comparison, the XMANet(DenseNet121) and XMANet(MobileNetV3) models yield enhancements of approximately 4.11% and 2.81%, respectively. Moreover, when using WT, the proposed XMANet achieves gains of around 4.26%, 9.36%, 5.72%, and 6.09% over the baseline ResNet50, DenseNet121, MobileNetV3, and EfficientNetB0 models, respectively. These results confirm that XMANet consistently improves performance across various architectures and signal processing techniques, demonstrating the strong potential of fine grained features for accurate and robust EMG classification.
