Table of Contents
Fetching ...

Towards Robust and Interpretable EMG-based Hand Gesture Recognition using Deep Metric Meta Learning

Simon Tam, Shriram Tallam Puranam Raghu, Étienne Buteau, Erik Scheme, Mounir Boukadoum, Alexandre Campeau-Lecours, Benoit Gosselin

TL;DR

This work tackles the poor generalization of EMG-based hand gesture recognition in unconstrained environments by reframing EMGPR as metric-based representation learning. It introduces a Siamese CNN trained with a contrastive approach to produce a semantically meaningful embedding, followed by a nearest-centroid classifier and a proximity-based confidence estimator for decision rejection. Across in-domain, domain-divergent, and out-of-domain scenarios, the proposed SD-CNN outperforms baselines on confidence-based metrics and online rejection capabilities, demonstrating improved generalization and interpretability. The results suggest a practical, robust EMG-PR framework suitable for real-world myoelectric interfaces, with implications for closed-loop, multi-modal HMIs and clinician-informed training.

Abstract

Current electromyography (EMG) pattern recognition (PR) models have been shown to generalize poorly in unconstrained environments, setting back their adoption in applications such as hand gesture control. This problem is often due to limited training data, exacerbated by the use of supervised classification frameworks that are known to be suboptimal in such settings. In this work, we propose a shift to deep metric-based meta-learning in EMG PR to supervise the creation of meaningful and interpretable representations. We use a Siamese Deep Convolutional Neural Network (SDCNN) and contrastive triplet loss to learn an EMG feature embedding space that captures the distribution of the different classes. A nearest-centroid approach is subsequently employed for inference, relying on how closely a test sample aligns with the established data distributions. We derive a robust class proximity-based confidence estimator that leads to a better rejection of incorrect decisions, i.e. false positives, especially when operating beyond the training data domain. We show our approach's efficacy by testing the trained SDCNN's predictions and confidence estimations on unseen data, both in and out of the training domain. The evaluation metrics include the accuracy-rejection curve and the Kullback-Leibler divergence between the confidence distributions of accurate and inaccurate predictions. Outperforming comparable models on both metrics, our results demonstrate that the proposed meta-learning approach improves the classifier's precision in active decisions (after rejection), thus leading to better generalization and applicability.

Towards Robust and Interpretable EMG-based Hand Gesture Recognition using Deep Metric Meta Learning

TL;DR

This work tackles the poor generalization of EMG-based hand gesture recognition in unconstrained environments by reframing EMGPR as metric-based representation learning. It introduces a Siamese CNN trained with a contrastive approach to produce a semantically meaningful embedding, followed by a nearest-centroid classifier and a proximity-based confidence estimator for decision rejection. Across in-domain, domain-divergent, and out-of-domain scenarios, the proposed SD-CNN outperforms baselines on confidence-based metrics and online rejection capabilities, demonstrating improved generalization and interpretability. The results suggest a practical, robust EMG-PR framework suitable for real-world myoelectric interfaces, with implications for closed-loop, multi-modal HMIs and clinician-informed training.

Abstract

Current electromyography (EMG) pattern recognition (PR) models have been shown to generalize poorly in unconstrained environments, setting back their adoption in applications such as hand gesture control. This problem is often due to limited training data, exacerbated by the use of supervised classification frameworks that are known to be suboptimal in such settings. In this work, we propose a shift to deep metric-based meta-learning in EMG PR to supervise the creation of meaningful and interpretable representations. We use a Siamese Deep Convolutional Neural Network (SDCNN) and contrastive triplet loss to learn an EMG feature embedding space that captures the distribution of the different classes. A nearest-centroid approach is subsequently employed for inference, relying on how closely a test sample aligns with the established data distributions. We derive a robust class proximity-based confidence estimator that leads to a better rejection of incorrect decisions, i.e. false positives, especially when operating beyond the training data domain. We show our approach's efficacy by testing the trained SDCNN's predictions and confidence estimations on unseen data, both in and out of the training domain. The evaluation metrics include the accuracy-rejection curve and the Kullback-Leibler divergence between the confidence distributions of accurate and inaccurate predictions. Outperforming comparable models on both metrics, our results demonstrate that the proposed meta-learning approach improves the classifier's precision in active decisions (after rejection), thus leading to better generalization and applicability.
Paper Structure (29 sections, 2 equations, 9 figures, 2 tables)

This paper contains 29 sections, 2 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: High-density electromyography (HD-EMG) muscle activity mapping. (a) Non-invasive HD-EMG array of electrodes. (b) Time-series signal are individually processed if needed. This example shows 64 channels for corresponding $4 \times 16 = 64$ electrodes. (c) From each channel, a mean absolute value (MAV) window captures a measure of the muscle contraction intensity. (d) The MAV of each channel is mapped to a pixel value in the muscle activity heat map, and the pixel location corresponds to the electrode's physical position in the array.
  • Figure 2: Siamese deep convolutional neural network architecture. The two branches of 2D convolution feature extractors embed the input data through multiple convolution layer blocks and yield a flattened feature vector. The feature vectors are then compared with a distance function to assess similarity. More than 2 branches may be used to compare larger tuples of input data.
  • Figure 3: Gesture classes used for classification: 0) close fist, 1) thumbs up, 2) chuck grip, 3) rest, 4) fine pinch, 5) index extension.
  • Figure 4: In-domain test confidence score distribution. The vertical axes represent the percentage of total test predictions. Close-up view with 100 bins, overview with 35 bins. Calibration curves, representing the percentage of correct predictions for each bin, are overlaid (secondary Y-axis), with linear trend shown as reference for ideal calibration (dotted line).
  • Figure 5: Domain-divergent test confidence score distribution. The vertical axes represent the percentage of total test predictions. Close-up view with 100 bins, overview with 35 bins. Calibration curves are overlaid (secondary Y-axis), with linear trend shown as reference.
  • ...and 4 more figures