Lessons Learned from Developing a Privacy-Preserving Multimodal Wearable for Local Voice-and-Vision Inference
Yonatan Tussa, Andy Heredia, Nirupam Roy
TL;DR
This paper tackles privacy concerns in continuous multimodal wearables by designing a privacy-preserving ear-mounted device that relies on a paired smartphone for local inference. It details hardware integration, power budgeting, and a fully offline on-device AI stack (audio, vision-language, and LLM components) orchestrated through a lightweight on-device intelligence architecture. Key contributions include a practical hardware prototype, a low-power wake-word detector trained on synthetic-device data, end-to-end on-device inference with 2–3 second latency, and design principles for privacy, responsiveness, and social acceptability in embedded AI wearables. The findings demonstrate feasibility on commodity mobile hardware and offer concrete guidance for future embedded AI systems that respect user privacy and energy constraints.
Abstract
Many promising applications of multimodal wearables require continuous sensing and heavy computation, yet users reject such devices due to privacy concerns. This paper shares our experiences building an ear-mounted voice-and-vision wearable that performs local AI inference using a paired smartphone as a trusted personal edge. We describe the hardware-software co-design of this privacy-preserving system, including challenges in integrating a camera, microphone, and speaker within a 30-gram form factor, enabling wake word-triggered capture, and running quantized vision-language and large-language models entirely offline. Through iterative prototyping, we identify key design hurdles in power budgeting, connectivity, latency, and social acceptability. Our initial evaluation shows that fully local multimodal inference is feasible on commodity mobile hardware with interactive latency. We conclude with design lessons for researchers developing embedded AI systems that balance privacy, responsiveness, and usability in everyday settings.
