Table of Contents
Fetching ...

Geometry of orofacial neuromuscular signals: speech articulation decoding using surface electromyography

Harshavardhana T. Gowda, Zachary D. McNaughton, Lee M. Miller

TL;DR

This work tackles decoding speech articulations from multichannel surface EMG by structuring EMG signals as edge covariances that lie on the SPD manifold. It introduces a geometry-aware pipeline, including Cholesky-based distances, Fréchet means, k-medoids/MDM clustering, and SPD-network architectures (SPDNet and a manifold-aware GRU) to decode gestures, phonemes, and words, plus a NATO-based spelling paradigm. A key contribution is showing that EMG embeddings on the SPD manifold are highly structured and discriminative, enabling data-efficient decoding with small training sets while revealing important cross-subject distribution shifts modeled as changes of basis. The open-source dataset (16 subjects) and code, along with demonstrations of phoneme- and word-level decoding using only ES, establish a foundation for practical EMG-to-language translation and inform model design for subject variability.

Abstract

Objective. In this article, we present data and methods for decoding speech articulations using surface electromyogram (EMG) signals. EMG-based speech neuroprostheses offer a promising approach for restoring audible speech in individuals who have lost the ability to speak intelligibly due to laryngectomy, neuromuscular diseases, stroke, or trauma-induced damage (e.g., from radiotherapy) to the speech articulators. Approach. To achieve this, we collect EMG signals from the face, jaw, and neck as subjects articulate speech, and we perform EMG-to-speech translation. Main results. Our findings reveal that the manifold of symmetric positive definite (SPD) matrices serves as a natural embedding space for EMG signals. Specifically, we provide an algebraic interpretation of the manifold-valued EMG data using linear transformations, and we analyze and quantify distribution shifts in EMG signals across individuals. Significance. Overall, our approach demonstrates significant potential for developing neural networks that are both data- and parameter-efficient, an important consideration for EMG-based systems, which face challenges in large-scale data collection and operate under limited computational resources on embedded devices.

Geometry of orofacial neuromuscular signals: speech articulation decoding using surface electromyography

TL;DR

This work tackles decoding speech articulations from multichannel surface EMG by structuring EMG signals as edge covariances that lie on the SPD manifold. It introduces a geometry-aware pipeline, including Cholesky-based distances, Fréchet means, k-medoids/MDM clustering, and SPD-network architectures (SPDNet and a manifold-aware GRU) to decode gestures, phonemes, and words, plus a NATO-based spelling paradigm. A key contribution is showing that EMG embeddings on the SPD manifold are highly structured and discriminative, enabling data-efficient decoding with small training sets while revealing important cross-subject distribution shifts modeled as changes of basis. The open-source dataset (16 subjects) and code, along with demonstrations of phoneme- and word-level decoding using only ES, establish a foundation for practical EMG-to-language translation and inform model design for subject variability.

Abstract

Objective. In this article, we present data and methods for decoding speech articulations using surface electromyogram (EMG) signals. EMG-based speech neuroprostheses offer a promising approach for restoring audible speech in individuals who have lost the ability to speak intelligibly due to laryngectomy, neuromuscular diseases, stroke, or trauma-induced damage (e.g., from radiotherapy) to the speech articulators. Approach. To achieve this, we collect EMG signals from the face, jaw, and neck as subjects articulate speech, and we perform EMG-to-speech translation. Main results. Our findings reveal that the manifold of symmetric positive definite (SPD) matrices serves as a natural embedding space for EMG signals. Specifically, we provide an algebraic interpretation of the manifold-valued EMG data using linear transformations, and we analyze and quantify distribution shifts in EMG signals across individuals. Significance. Overall, our approach demonstrates significant potential for developing neural networks that are both data- and parameter-efficient, an important consideration for EMG-based systems, which face challenges in large-scale data collection and operate under limited computational resources on embedded devices.

Paper Structure

This paper contains 39 sections, 15 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Placement of electrodes on the neck region.
  • Figure 2: Placement of electrodes on cheek and lip regions. Electrode 1 is above the upper lip and electrode 3 is below the lower lip.
  • Figure 3: Different orofacial gestures are naturally distinguishable on the manifold of SPD matrices. t-SNE of edge matrices of various orofacial movements described in section\ref{['sec:orofacial']} for subject 1. Embedding is colored according to gestures ( a.u. - arbitrary units).
  • Figure 4: All edge matrices within an individual can be approximately diagonalized. Blue: average value of $\frac{\max(\textsc{abs}((\textsc{non diag}(\mathcal{E}^{(0)}))}{\max(\textsc{diag}(\mathcal{E}^{(0)}))}$ for all word articulations. Red: average value of $\frac{\max(\textsc{abs}((\textsc{non diag}(\mathcal{E}^{(1)}))}{\max(\textsc{diag}(\mathcal{E}^{(1)})}$ for all word articulations. As we can see, $\mathcal{E}^{(1)}$ are approximately diagonal compared to $\mathcal{E}^{(0)}$.
  • Figure 5: EMG embeddings of speech articulations on the manifold differ substantially across individuals. Approximate eigenbasis vectors ($Q = W^{(1)}$) are different for different individuals. $\theta = \cos^{-1}\left(\frac{\texttt{trace}(Q_iQ_j^T)}{\sqrt{\texttt{trace}(Q_iQ_i^T)}\sqrt{\texttt{trace}(Q_jQ_j^T)}})\right)$ between approximate eigenbasis matrices $Q_i$ and $Q_j$ of different individuals $i$ and $j$.
  • ...and 4 more figures