Table of Contents
Fetching ...

MoEMba: A Mamba-based Mixture of Experts for High-Density EMG-based Hand Gesture Recognition

Mehran Shabanpour, Kasra Rad, Sadaf Khademi, Arash Mohammadi

TL;DR

The paper tackles inter-session variability in HD-sEMG gesture recognition by introducing MoEMba, a Mamba-based Mixture of Experts framework that combines Selective State-Space Models with wavelet feature modulation and channel attention. This approach captures long-range temporal dependencies and cross-channel interactions while maintaining computational efficiency suitable for real-time use, achieving a balanced accuracy of $56.9\%$ on CapgMyo DB-b and demonstrating robustness to session shifts. Key contributions include the first application of Mamba to HD-sEMG gesture recognition, an adaptive MoE design with sparsity and balance constraints, and the integration of WTFM to fuse time-domain and frequency-domain information. The results show competitive performance with lower complexity than transformer-based models, highlighting a practical path toward reliable, high-density EMG-driven HCI, prosthetics control, and neuromuscular applications; future work may explore synthetic data augmentation and multi-modal integrations.

Abstract

High-Density surface Electromyography (HDsEMG) has emerged as a pivotal resource for Human-Computer Interaction (HCI), offering direct insights into muscle activities and motion intentions. However, a significant challenge in practical implementations of HD-sEMG-based models is the low accuracy of inter-session and inter-subject classification. Variability between sessions can reach up to 40% due to the inherent temporal variability of HD-sEMG signals. Targeting this challenge, the paper introduces the MoEMba framework, a novel approach leveraging Selective StateSpace Models (SSMs) to enhance HD-sEMG-based gesture recognition. The MoEMba framework captures temporal dependencies and cross-channel interactions through channel attention techniques. Furthermore, wavelet feature modulation is integrated to capture multi-scale temporal and spatial relations, improving signal representation. Experimental results on the CapgMyo HD-sEMG dataset demonstrate that MoEMba achieves a balanced accuracy of 56.9%, outperforming its state-of-the-art counterparts. The proposed framework's robustness to session-to-session variability and its efficient handling of high-dimensional multivariate time series data highlight its potential for advancing HD-sEMG-powered HCI systems.

MoEMba: A Mamba-based Mixture of Experts for High-Density EMG-based Hand Gesture Recognition

TL;DR

The paper tackles inter-session variability in HD-sEMG gesture recognition by introducing MoEMba, a Mamba-based Mixture of Experts framework that combines Selective State-Space Models with wavelet feature modulation and channel attention. This approach captures long-range temporal dependencies and cross-channel interactions while maintaining computational efficiency suitable for real-time use, achieving a balanced accuracy of on CapgMyo DB-b and demonstrating robustness to session shifts. Key contributions include the first application of Mamba to HD-sEMG gesture recognition, an adaptive MoE design with sparsity and balance constraints, and the integration of WTFM to fuse time-domain and frequency-domain information. The results show competitive performance with lower complexity than transformer-based models, highlighting a practical path toward reliable, high-density EMG-driven HCI, prosthetics control, and neuromuscular applications; future work may explore synthetic data augmentation and multi-modal integrations.

Abstract

High-Density surface Electromyography (HDsEMG) has emerged as a pivotal resource for Human-Computer Interaction (HCI), offering direct insights into muscle activities and motion intentions. However, a significant challenge in practical implementations of HD-sEMG-based models is the low accuracy of inter-session and inter-subject classification. Variability between sessions can reach up to 40% due to the inherent temporal variability of HD-sEMG signals. Targeting this challenge, the paper introduces the MoEMba framework, a novel approach leveraging Selective StateSpace Models (SSMs) to enhance HD-sEMG-based gesture recognition. The MoEMba framework captures temporal dependencies and cross-channel interactions through channel attention techniques. Furthermore, wavelet feature modulation is integrated to capture multi-scale temporal and spatial relations, improving signal representation. Experimental results on the CapgMyo HD-sEMG dataset demonstrate that MoEMba achieves a balanced accuracy of 56.9%, outperforming its state-of-the-art counterparts. The proposed framework's robustness to session-to-session variability and its efficient handling of high-dimensional multivariate time series data highlight its potential for advancing HD-sEMG-powered HCI systems.

Paper Structure

This paper contains 5 sections, 10 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: $\text{MoEMba}$ Framework. (a) Raw time series data is pre-processed via patching. (b) Shallow features are extracted using small $(3\times3)$ and large $(7\times7)$ receptive field convolutions, and wavelet transform-modulated features, with channel attention applied to EMG signal patches. (c) A Mixture of Experts (MoE) block routes patches to two Mamba experts based on a gating network. (d) The Mamba block incorporates projections, $1$D convolution, selective State-Space Modeling (SSM), and skip connections for temporal dependency learning.
  • Figure 2: Confusion Matrix and Receiver Operating Characteristic (ROC) Curves. (a) The heatmap visually represents the class-wise prediction probabilities within the primary dataset. The horizontal axis corresponds to predicted labels, while the vertical axis indicates true labels. Diagonal values reflect individual class accuracies, whereas off-diagonal values represent misclassification probabilities. Darker colors signify higher probabilities, while lighter shades indicate lower ones. The chance level is $0.125$. (b) The ROC curves illustrate the classification performance of $\text{MoEMba}$ across different discrimination thresholds. Each class is color-coded, with corresponding labels and the Area Under the ROC Curves (AUC) provided in the legend at the bottom right. The dashed black diagonal line represents random chance classification.