WaveFormer: A Lightweight Transformer Model for sEMG-based Gesture Recognition
Yanlong Chen, Mattia Orlandi, Pierangelo Maria Rapa, Simone Benatti, Luca Benini, Yawei Li
TL;DR
WaveFormer addresses the challenge of accurate sEMG-based gesture recognition on resource-constrained devices by integrating a learnable WaveletConv front-end with RoPE-attention in a compact Transformer (3.1M parameters). The model performs multiscale time–frequency analysis and efficient attention, achieving state-of-the-art results on multiple datasets (e.g., 95% on EPN612) and enabling real-time deployment with 6.75 ms latency on CPU using INT8 quantization. Key contributions include a trainable multilevel wavelet decomposition, a residual low-frequency path, and RoPE-based classification, which together provide robustness to session variability and electrode drift. The findings highlight the practical potential for prosthetic control and rehabilitation applications, offering a scalable, frequency-aware approach suitable for wearable devices.
Abstract
Human-machine interaction, particularly in prosthetic and robotic control, has seen progress with gesture recognition via surface electromyographic (sEMG) signals.However, classifying similar gestures that produce nearly identical muscle signals remains a challenge, often reducing classification accuracy. Traditional deep learning models for sEMG gesture recognition are large and computationally expensive, limiting their deployment on resource-constrained embedded systems. In this work, we propose WaveFormer, a lightweight transformer-based architecture tailored for sEMG gesture recognition. Our model integrates time-domain and frequency-domain features through a novel learnable wavelet transform, enhancing feature extraction. In particular, the WaveletConv module, a multi-level wavelet decomposition layer with depthwise separable convolution, ensures both efficiency and compactness. With just 3.1 million parameters, WaveFormer achieves 95% classification accuracy on the EPN612 dataset, outperforming larger models. Furthermore, when profiled on a laptop equipped with an Intel CPU, INT8 quantization achieves real-time deployment with a 6.75 ms inference latency.
