Proactive Hearing Assistants that Isolate Egocentric Conversations
Guilin Hu, Malek Itani, Tuochao Chen, Shyamnath Gollakota
TL;DR
The paper tackles the challenge of the cocktail party problem for hearing aids by proposing a real-time, on-device proactive assistant that identifies wearer's conversational partners without prompts. It introduces a dual-model architecture that uses the wearer's self-speech as an anchor and combines a fast streaming model with a slower embedding model to capture long-range dialogue dynamics, enabling accurate extraction and suppression of competing voices. Training combines synthetic, spatialized data with real-world fine-tuning, and evaluation shows strong generalization across languages and speaker counts, with improvements in SISDRi and PESQ as well as high partner-identity accuracy. The work demonstrates practical feasibility for on-device hearing augmentation and lays groundwork for future integration with dialogue-aware AI systems to track and maintain engagement in noisy, multi-party environments.
Abstract
We introduce proactive hearing assistants that automatically identify and separate the wearer's conversation partners, without requiring explicit prompts. Our system operates on egocentric binaural audio and uses the wearer's self-speech as an anchor, leveraging turn-taking behavior and dialogue dynamics to infer conversational partners and suppress others. To enable real-time, on-device operation, we propose a dual-model architecture: a lightweight streaming model runs every 12.5 ms for low-latency extraction of the conversation partners, while a slower model runs less frequently to capture longer-range conversational dynamics. Results on real-world 2- and 3-speaker conversation test sets, collected with binaural egocentric hardware from 11 participants totaling 6.8 hours, show generalization in identifying and isolating conversational partners in multi-conversation settings. Our work marks a step toward hearing assistants that adapt proactively to conversational dynamics and engagement. More information can be found on our website: https://proactivehearing.cs.washington.edu/
