Poster: Recognizing Hidden-in-the-Ear Private Key for Reliable Silent Speech Interface Using Multi-Task Learning
Xuefu Dong, Liqiang Xu, Lixing He, Zengyi Han, Ken Christofferson, Yifei Chen, Akihito Taya, Yuuki Nishiyama, Kaoru Sezaki
TL;DR
HEar-ID tackles privacy-preserving silent speech interfaces by jointly authenticating the user and decoding silent spellings from in-ear audio signals. It introduces a CLWUM-based contrastive learning framework within a multi-task architecture to align genuine whisper-ultrasonic pairs while enabling word-level spelling via a CTC decoder. Preliminary results with 11 participants show robust authentication (low FPR around 3%) and competitive spelling performance, with mean Top-1 accuracy of 67.3% and up to 90.25% for eight users. The work demonstrates the feasibility of secure, hands-free interaction on consumer earbud hardware by coupling private-key style embeddings with silent-speech interfaces.
Abstract
Silent speech interface (SSI) enables hands-free input without audible vocalization, but most SSI systems do not verify speaker identity. We present HEar-ID, which uses consumer active noise-canceling earbuds to capture low-frequency "whisper" audio and high-frequency ultrasonic reflections. Features from both streams pass through a shared encoder, producing embeddings that feed a contrastive branch for user authentication and an SSI head for silent spelling recognition. This design supports decoding of 50 words while reliably rejecting impostors, all on commodity earbuds with a single model. Experiments demonstrate that HEar-ID achieves strong spelling accuracy and robust authentication.
