Table of Contents
Fetching ...

SilentWear: an Ultra-Low Power Wearable System for EMG-based Silent Speech Recognition

Giusy Spacone, Sebastian Frey, Giovanni Pollo, Alessio Burrello, Daniele Jahier Pagliari, Victor Kartsch, Andrea Cossettini, Luca Benini

TL;DR

This work presents SilentWear, a fully wearable, textile-based neck interface for EMG signal acquisition and processing, and proposes an incremental fine-tuning strategy, demonstrating more than 10% accuracy recovery with less than 10 minutes of additional user data.

Abstract

Detecting speech from biosignals is gaining increasing attention due to the potential to develop human-computer interfaces that are noise-robust, privacy-preserving, and scalable for both clinical applications and daily use. However, most existing approaches remain limited by insufficient wearability and the lack of edge-processing capabilities, which are essential for minimally obtrusive, responsive, and private assistive technologies. In this work, we present SilentWear, a fully wearable, textile-based neck interface for EMG signal acquisition and processing. Powered by BioGAP-Ultra, the system enables end-to-end data acquisition from 14 differential channels and on-device speech recognition. SilentWear is coupled with SpeechNet, a lightweight 15k-parameter CNN architecture specifically tailored for EMG-based speech decoding, achieving an average cross-validated accuracy of 84.8$\pm$4.6% and 77.5$\pm$6.6% for vocalized and silent speech, respectively, over eight representative human-machine interaction commands collected over multiple days. We evaluate robustness to repositioning induced by multi-day use. In an inter-session setting, the system achieves average accuracies of 71.1$\pm$8.3% and 59.3\pm2.2% for vocalized and silent speech, respectively. To mitigate performance degradation due to repositioning, we propose an incremental fine-tuning strategy, demonstrating more than 10% accuracy recovery with less than 10 minutes of additional user data. Finally, we demonstrate end-to-end real-time on-device speech recognition on a commercial multi-core microcontroller unit (MCU), achieving an energy consumption of 63.9$μ$J per inference with a latency of 2.47 ms. With a total power consumption of 20.5mW for acquisition, inference, and wireless transmission of results, SilentWear enables continuous operation for more than 27 hours.

SilentWear: an Ultra-Low Power Wearable System for EMG-based Silent Speech Recognition

TL;DR

This work presents SilentWear, a fully wearable, textile-based neck interface for EMG signal acquisition and processing, and proposes an incremental fine-tuning strategy, demonstrating more than 10% accuracy recovery with less than 10 minutes of additional user data.

Abstract

Detecting speech from biosignals is gaining increasing attention due to the potential to develop human-computer interfaces that are noise-robust, privacy-preserving, and scalable for both clinical applications and daily use. However, most existing approaches remain limited by insufficient wearability and the lack of edge-processing capabilities, which are essential for minimally obtrusive, responsive, and private assistive technologies. In this work, we present SilentWear, a fully wearable, textile-based neck interface for EMG signal acquisition and processing. Powered by BioGAP-Ultra, the system enables end-to-end data acquisition from 14 differential channels and on-device speech recognition. SilentWear is coupled with SpeechNet, a lightweight 15k-parameter CNN architecture specifically tailored for EMG-based speech decoding, achieving an average cross-validated accuracy of 84.84.6% and 77.56.6% for vocalized and silent speech, respectively, over eight representative human-machine interaction commands collected over multiple days. We evaluate robustness to repositioning induced by multi-day use. In an inter-session setting, the system achieves average accuracies of 71.18.3% and 59.3\pm2.2% for vocalized and silent speech, respectively. To mitigate performance degradation due to repositioning, we propose an incremental fine-tuning strategy, demonstrating more than 10% accuracy recovery with less than 10 minutes of additional user data. Finally, we demonstrate end-to-end real-time on-device speech recognition on a commercial multi-core microcontroller unit (MCU), achieving an energy consumption of 63.9J per inference with a latency of 2.47 ms. With a total power consumption of 20.5mW for acquisition, inference, and wireless transmission of results, SilentWear enables continuous operation for more than 27 hours.
Paper Structure (19 sections, 1 equation, 9 figures, 4 tables)

This paper contains 19 sections, 1 equation, 9 figures, 4 tables.

Figures (9)

  • Figure 1: a) SilentWear neckband. The interface features 14 EMG differential acquisition channels; 4 channels provide the ground reference.; b) Overview of the acquisition hardware, based on BioGAP-Ultra. The system features a baseboard, for BLE communication and on-device processing, and an EMG acquisition board, with a total size of $26\times65\times13\,mm^{3}$.
  • Figure 2: Data collection protocol. (a) Recordings are organized into sessions conducted over multiple days. Each session consists of 10 batches (5 vocalized and 5 silent), executed alternately, with 2-minute rest intervals between batches. (b) In each batch, subjects repeat 8 HMI-related commands, each performed 20 times in randomized order.
  • Figure 3: Visualisation of the signals of the 14 EMG Channels for the 8 HMI commands, for vocalized (top) and silent (bottom) recordings.
  • Figure 4: Per Subjects Confusion Matrices for the a) Global Evaluation Setting and b) Inter-Session Evaluation Setting for vocalized and silent speech, for EMG window size of $1400\text{ms}$.
  • Figure 5: ITR analysis for vocalized speech under the Inter-Session evaluation setting. Blue curves (left axis) show mean accuracy and red curves (right axis) show mean ITR as a function of window size. For S01–S04, markers with error bars represent the mean and standard deviation across sessions. In the average plot, markers with error bars represent the mean and standard deviation across subjects.
  • ...and 4 more figures