SplashNet: Split-and-Share Encoders for Accurate and Efficient Typing with Surface Electromyography
Nima Hadidi, Jason Chan, Ebrahim Feghhi, Jonathan C. Kao
TL;DR
SplashNet tackles cross‑user generalization in wrist sEMG typing by introducing three simple, causal components: Rolling Time Normalization for per‑session normalization, Aggressive Channel Masking to emphasize transferable low‑order features, and Split‑and‑Share encoders that respect bilateral typing yet share weights to be computation‑efficient. Combined with reduced spectral granularity, these priors enable on‑device inference with substantial improvements in zero‑shot and finetuned character error rates, achieving a new state of the art on the emg2qwerty benchmark while using far fewer parameters and FLOPs. The approach demonstrates that principled inductive biases can rival data scaling for sEMG decoding, offering practical pathways toward keyboard‑quality wrist EMG interfaces for AR/VR and assistive technologies. Limitations include differences under greedy decoding, the need for validation on diverse populations, and engineering steps toward fully split on‑band inference; future work could extend RTN/ACM to other EMG tasks and explore hybrid cross‑hand interactions.
Abstract
Surface electromyography (sEMG) at the wrists could enable natural, keyboard-free text entry, yet the state-of-the-art emg2qwerty baseline still misrecognizes $51.8\%$ of characters in the zero-shot setting on unseen users and $7.0\%$ after user-specific fine-tuning. We trace many of these errors to mismatched cross-user signal statistics, fragile reliance on high-order feature dependencies, and the absence of architectural inductive biases aligned with the bilateral nature of typing. To address these issues, we introduce three simple modifications: (i) Rolling Time Normalization, which adaptively aligns input distributions across users; (ii) Aggressive Channel Masking, which encourages reliance on low-order feature combinations more likely to generalize across users; and (iii) a Split-and-Share encoder that processes each hand independently with weight-shared streams to reflect the bilateral symmetry of the neuromuscular system. Combined with a five-fold reduction in spectral resolution ($33\!\rightarrow\!6$ frequency bands), these components yield a compact Split-and-Share model, SplashNet-mini, which uses only $\tfrac14$ the parameters and $0.6\times$ the FLOPs of the baseline while reducing character-error rate (CER) to $36.4\%$ zero-shot and $5.9\%$ after fine-tuning. An upscaled variant, SplashNet ($\tfrac12$ the parameters, $1.15\times$ the FLOPs of the baseline), further lowers error to $35.7\%$ and $5.5\%$, representing relative improvements of $31\%$ and $21\%$ in the zero-shot and fine-tuned settings, respectively. SplashNet therefore establishes a new state of the art without requiring additional data.
