Table of Contents
Fetching ...

SplashNet: Split-and-Share Encoders for Accurate and Efficient Typing with Surface Electromyography

Nima Hadidi, Jason Chan, Ebrahim Feghhi, Jonathan C. Kao

TL;DR

SplashNet tackles cross‑user generalization in wrist sEMG typing by introducing three simple, causal components: Rolling Time Normalization for per‑session normalization, Aggressive Channel Masking to emphasize transferable low‑order features, and Split‑and‑Share encoders that respect bilateral typing yet share weights to be computation‑efficient. Combined with reduced spectral granularity, these priors enable on‑device inference with substantial improvements in zero‑shot and finetuned character error rates, achieving a new state of the art on the emg2qwerty benchmark while using far fewer parameters and FLOPs. The approach demonstrates that principled inductive biases can rival data scaling for sEMG decoding, offering practical pathways toward keyboard‑quality wrist EMG interfaces for AR/VR and assistive technologies. Limitations include differences under greedy decoding, the need for validation on diverse populations, and engineering steps toward fully split on‑band inference; future work could extend RTN/ACM to other EMG tasks and explore hybrid cross‑hand interactions.

Abstract

Surface electromyography (sEMG) at the wrists could enable natural, keyboard-free text entry, yet the state-of-the-art emg2qwerty baseline still misrecognizes $51.8\%$ of characters in the zero-shot setting on unseen users and $7.0\%$ after user-specific fine-tuning. We trace many of these errors to mismatched cross-user signal statistics, fragile reliance on high-order feature dependencies, and the absence of architectural inductive biases aligned with the bilateral nature of typing. To address these issues, we introduce three simple modifications: (i) Rolling Time Normalization, which adaptively aligns input distributions across users; (ii) Aggressive Channel Masking, which encourages reliance on low-order feature combinations more likely to generalize across users; and (iii) a Split-and-Share encoder that processes each hand independently with weight-shared streams to reflect the bilateral symmetry of the neuromuscular system. Combined with a five-fold reduction in spectral resolution ($33\!\rightarrow\!6$ frequency bands), these components yield a compact Split-and-Share model, SplashNet-mini, which uses only $\tfrac14$ the parameters and $0.6\times$ the FLOPs of the baseline while reducing character-error rate (CER) to $36.4\%$ zero-shot and $5.9\%$ after fine-tuning. An upscaled variant, SplashNet ($\tfrac12$ the parameters, $1.15\times$ the FLOPs of the baseline), further lowers error to $35.7\%$ and $5.5\%$, representing relative improvements of $31\%$ and $21\%$ in the zero-shot and fine-tuned settings, respectively. SplashNet therefore establishes a new state of the art without requiring additional data.

SplashNet: Split-and-Share Encoders for Accurate and Efficient Typing with Surface Electromyography

TL;DR

SplashNet tackles cross‑user generalization in wrist sEMG typing by introducing three simple, causal components: Rolling Time Normalization for per‑session normalization, Aggressive Channel Masking to emphasize transferable low‑order features, and Split‑and‑Share encoders that respect bilateral typing yet share weights to be computation‑efficient. Combined with reduced spectral granularity, these priors enable on‑device inference with substantial improvements in zero‑shot and finetuned character error rates, achieving a new state of the art on the emg2qwerty benchmark while using far fewer parameters and FLOPs. The approach demonstrates that principled inductive biases can rival data scaling for sEMG decoding, offering practical pathways toward keyboard‑quality wrist EMG interfaces for AR/VR and assistive technologies. Limitations include differences under greedy decoding, the need for validation on diverse populations, and engineering steps toward fully split on‑band inference; future work could extend RTN/ACM to other EMG tasks and explore hybrid cross‑hand interactions.

Abstract

Surface electromyography (sEMG) at the wrists could enable natural, keyboard-free text entry, yet the state-of-the-art emg2qwerty baseline still misrecognizes of characters in the zero-shot setting on unseen users and after user-specific fine-tuning. We trace many of these errors to mismatched cross-user signal statistics, fragile reliance on high-order feature dependencies, and the absence of architectural inductive biases aligned with the bilateral nature of typing. To address these issues, we introduce three simple modifications: (i) Rolling Time Normalization, which adaptively aligns input distributions across users; (ii) Aggressive Channel Masking, which encourages reliance on low-order feature combinations more likely to generalize across users; and (iii) a Split-and-Share encoder that processes each hand independently with weight-shared streams to reflect the bilateral symmetry of the neuromuscular system. Combined with a five-fold reduction in spectral resolution ( frequency bands), these components yield a compact Split-and-Share model, SplashNet-mini, which uses only the parameters and the FLOPs of the baseline while reducing character-error rate (CER) to zero-shot and after fine-tuning. An upscaled variant, SplashNet ( the parameters, the FLOPs of the baseline), further lowers error to and , representing relative improvements of and in the zero-shot and fine-tuned settings, respectively. SplashNet therefore establishes a new state of the art without requiring additional data.

Paper Structure

This paper contains 36 sections, 10 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: a) Top row: Peri-stimulus time histograms (PSTHs) for the "e" key with (top) and without (bottom) RTN for two users. Each PSTH shows the spectral features derived from the left hand, with spectrograms from the 16 electrodes concatenated together. RTN mitigates the significant differences in across-user feature scale and bias. b) Top Row: PSTHs for the "e" key from 3 training users. Note that some of User 4's features show similar patterns to User 3, while others show similar patterns to User 5. ACM isolates small feature combinations, which are more often shared across users.
  • Figure 2: a) The bilateral structure of keyboard typing. b) The Split-and-Share macro-architecture.
  • Figure 3: EMG encoder architectures. Left (blue) and right (red) hand specific modules only process inputs from a single hand, with hand-specific weights. Joint (purple) modules jointly process inputs from both hands. Shared (gray) modules process inputs from either hand using identical weights. a)Joint-Hand baseline architecture of sivakumar-2024, b)Split-only architecture, in which hand-specific modules process signals from each hand separately. c)Split-and-Share architecture, where shared-weight modules process signals from each hand separately.
  • Figure 4: Zero-shot and finetuned CER distribution across users. Each of the 8 test users are represented by a dot, with lines connecting the same user across models. Boxplots depict median and interquartile ranges. Our methods improve performance for all participants relative to the baseline of sivakumar-2024, with some participants showing very large improvements: two users reach CER between 20-30% in the zero-shot setting, and one user attains a CER below 2% when finetuned.
  • Figure 5: UMAP visualization of model activations after the first TDSConv block for four models (+RSG, +RSG+ACM, +RSG+ACM+RTN, and SplashNet-mini). We extracted activations from every 100th timestep from one session of each of the 8 held-out users. Colors indicate the user identity of each point. In the models without RTN, some users’ representations occupy largely disjoint regions of the activation manifold, whereas models with RTN (+RSG+ACM+RTN and SplashNet-mini) produce markedly more overlapping per-user representations, indicating improved cross-user alignment in the learned feature space.