Table of Contents
Fetching ...

EMG-to-Speech with Fewer Channels

Injune Hwang, Jaejun Lee, Kyogu Lee

TL;DR

This work addresses the practicality of EMG-to-speech by focusing on channel-efficient, open-vocabulary silent speech decoding. It combines a systematic analysis of channel contributions (backward elimination and exhaustive 4-channel evaluation) with phoneme-level insights and a training strategy that pretrains on full-channel data and finetunes on reduced-channel inputs using channel dropout. The key findings show complementary channel interactions, with 4–6 channel systems benefiting from fine-tuning a pretrained model and randomized channel dropout, yielding robust performance despite sensor reduction. Collectively, the results support developing lightweight, wearable EMG-based silent-speech systems that maintain high-quality speech reconstruction with fewer sensors.

Abstract

Surface electromyography (EMG) is a promising modality for silent speech interfaces, but its effectiveness depends heavily on sensor placement and channel availability. In this work, we investigate the contribution of individual and combined EMG channels to speech reconstruction performance. Our findings reveal that while certain EMG channels are individually more informative, the highest performance arises from subsets that leverage complementary relationships among channels. We also analyzed phoneme classification accuracy under channel ablations and observed interpretable patterns reflecting the anatomical roles of the underlying muscles. To address performance degradation from channel reduction, we pretrained models on full 8-channel data using random channel dropout and fine-tuned them on reduced-channel subsets. Fine-tuning consistently outperformed training from scratch for 4 - 6 channel settings, with the best dropout strategy depending on the number of channels. These results suggest that performance degradation from sensor reduction can be mitigated through pretraining and channel-aware design, supporting the development of lightweight and practical EMG-based silent speech systems.

EMG-to-Speech with Fewer Channels

TL;DR

This work addresses the practicality of EMG-to-speech by focusing on channel-efficient, open-vocabulary silent speech decoding. It combines a systematic analysis of channel contributions (backward elimination and exhaustive 4-channel evaluation) with phoneme-level insights and a training strategy that pretrains on full-channel data and finetunes on reduced-channel inputs using channel dropout. The key findings show complementary channel interactions, with 4–6 channel systems benefiting from fine-tuning a pretrained model and randomized channel dropout, yielding robust performance despite sensor reduction. Collectively, the results support developing lightweight, wearable EMG-based silent-speech systems that maintain high-quality speech reconstruction with fewer sensors.

Abstract

Surface electromyography (EMG) is a promising modality for silent speech interfaces, but its effectiveness depends heavily on sensor placement and channel availability. In this work, we investigate the contribution of individual and combined EMG channels to speech reconstruction performance. Our findings reveal that while certain EMG channels are individually more informative, the highest performance arises from subsets that leverage complementary relationships among channels. We also analyzed phoneme classification accuracy under channel ablations and observed interpretable patterns reflecting the anatomical roles of the underlying muscles. To address performance degradation from channel reduction, we pretrained models on full 8-channel data using random channel dropout and fine-tuned them on reduced-channel subsets. Fine-tuning consistently outperformed training from scratch for 4 - 6 channel settings, with the best dropout strategy depending on the number of channels. These results suggest that performance degradation from sensor reduction can be mitigated through pretraining and channel-aware design, supporting the development of lightweight and practical EMG-based silent speech systems.
Paper Structure (20 sections, 2 equations, 3 figures, 3 tables)

This paper contains 20 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of the EMG-to-speech framework. The masking block appears only in our fine-tuning variants. EMG channel locations are illustrated in accordance with the description below.
  • Figure 2: WERs of models using different numbers of EMG channels. Each dot represents a model configuration, and the blue line connects the best-performing configuration at each channel count.
  • Figure 3: Effect of channel dropout and fine-tuning on WER across different channel configurations. 8-channel performance shown as reference.