Table of Contents
Fetching ...

TinyMyo: a Tiny Foundation Model for Flexible EMG Signal Processing at the Edge

Matteo Fasulo, Giusy Spacone, Thorir Mar Ingolfsson, Yawei Li, Luca Benini, Andrea Cossettini

TL;DR

This work tackles the generalization and edge-deployability gap in EMG processing by introducing TinyMyo, a compact 3.6M-parameter Transformer encoder pre-trained with self-supervised masked reconstruction on diverse EMG datasets. The model supports multiple downstream tasks—gesture classification, kinematic regression, and speech-related tasks—through lightweight task-specific heads, achieving state-of-the-art or competitive results on NinaPro DB5, EPN-612, UCI-EMG, Ninapro DB8, and the Gaddy Silent Speech Dataset. Importantly, TinyMyo is demonstrated on an ultra-low-power GAP9 MCU with an average power envelope of $36.45\text{mW}$ and 12.2 s latency, highlighting practical edge deployment. The authors also open-source the pre-trained backbone and downstream architectures to accelerate future EMG research and standardize a foundation for diverse sensing configurations.

Abstract

Surface electromyography (EMG) is a non-invasive sensing modality used in several domains, including biomechanics, rehabilitation, prosthetic control, and emerging human-machine interaction paradigms. Despite decades of use, significant challenges remain in achieving robust generalization across subjects, recording systems, and acquisition protocols. To tackle these challenges, foundation models (FMs) are gaining traction when targeting end-to-end applications based on EMG signals. Yet, existing EMG FMs remain limited to single downstream tasks and lack deployability on embedded platforms. In this work, we present TinyMyo, a lightweight FM based on a Transformer encoder architecture. The model is pre-trained in a self-supervised manner on publicly available datasets and achieves high reconstruction fidelity with only 3.6M parameters. With minimal task-specific head adaptations, the same backbone is used to tackle multiple downstream tasks, leveraging datasets acquired from diverse sensing locations and hardware platforms. We demonstrate generalization across hand gesture classification, hand kinematic regression, speech production and recognition, with performance comparable to or surpassing the state of the art (SoA), and model size below 5M parameters. We achieve SoA results compared to previous FM-based works on the NinaPro DB5 ($89.4\pm0.16\%$), UCI-EMG ($97.56\pm0.32\%$), and EPN-612 ($96.74\pm0.09\%$) datasets. We report, to the best of our knowledge, the first deployment of an EMG FM on an ultra-low-power microcontroller (GAP9), achieving an average power envelope of 36.45mW. By open-sourcing the pre-trained and the downstream task architectures (https://github.com/pulp-bio/BioFoundation), we aim to provide a flexible resource that can accelerate future research and serve as a common foundation for the EMG community.

TinyMyo: a Tiny Foundation Model for Flexible EMG Signal Processing at the Edge

TL;DR

This work tackles the generalization and edge-deployability gap in EMG processing by introducing TinyMyo, a compact 3.6M-parameter Transformer encoder pre-trained with self-supervised masked reconstruction on diverse EMG datasets. The model supports multiple downstream tasks—gesture classification, kinematic regression, and speech-related tasks—through lightweight task-specific heads, achieving state-of-the-art or competitive results on NinaPro DB5, EPN-612, UCI-EMG, Ninapro DB8, and the Gaddy Silent Speech Dataset. Importantly, TinyMyo is demonstrated on an ultra-low-power GAP9 MCU with an average power envelope of and 12.2 s latency, highlighting practical edge deployment. The authors also open-source the pre-trained backbone and downstream architectures to accelerate future EMG research and standardize a foundation for diverse sensing configurations.

Abstract

Surface electromyography (EMG) is a non-invasive sensing modality used in several domains, including biomechanics, rehabilitation, prosthetic control, and emerging human-machine interaction paradigms. Despite decades of use, significant challenges remain in achieving robust generalization across subjects, recording systems, and acquisition protocols. To tackle these challenges, foundation models (FMs) are gaining traction when targeting end-to-end applications based on EMG signals. Yet, existing EMG FMs remain limited to single downstream tasks and lack deployability on embedded platforms. In this work, we present TinyMyo, a lightweight FM based on a Transformer encoder architecture. The model is pre-trained in a self-supervised manner on publicly available datasets and achieves high reconstruction fidelity with only 3.6M parameters. With minimal task-specific head adaptations, the same backbone is used to tackle multiple downstream tasks, leveraging datasets acquired from diverse sensing locations and hardware platforms. We demonstrate generalization across hand gesture classification, hand kinematic regression, speech production and recognition, with performance comparable to or surpassing the state of the art (SoA), and model size below 5M parameters. We achieve SoA results compared to previous FM-based works on the NinaPro DB5 (), UCI-EMG (), and EPN-612 () datasets. We report, to the best of our knowledge, the first deployment of an EMG FM on an ultra-low-power microcontroller (GAP9), achieving an average power envelope of 36.45mW. By open-sourcing the pre-trained and the downstream task architectures (https://github.com/pulp-bio/BioFoundation), we aim to provide a flexible resource that can accelerate future research and serve as a common foundation for the EMG community.

Paper Structure

This paper contains 19 sections, 10 equations, 2 figures, 11 tables.

Figures (2)

  • Figure 1: Overview of the TinyMyo architecture. (a) Pre-training framework: the input signal is first tokenized using a channel-independent patching strategy with random masking, processed by eight bidirectional Transformer Encoder blocks, and reconstructed by a Lightweight Decoder. (b) Downstream architectures: after pre-training, the Decoder is removed and replaced with task-specific heads. For Hand-Gesture Classification and Kinematic Regression, a linear head is used to produce output classes or joint angles. For speech-related tasks, the EMG input is first downsampled through residual convolutional blocks and then passed through the pre-trained Transformer Encoder, followed by a linear head. The output is text for the Speech Recognition task. For the the Speech Production task, a vocoder (HiFi-GAN) is added to produce audio from the predicted MFCC features. (c) Model Deployment: the architecture for the hand-gesture classification task is implemented on the GAP9 microcontroller.
  • Figure 2: Example of reconstruction for a EMG window by the proposed Foundation model. Grey areas correspond to the masked signal regions, white areas to the unmasked.