Table of Contents
Fetching ...

Wireless Hearables With Programmable Speech AI Accelerators

Malek Itani, Tuochao Chen, Arun Raghavan, Gavriel Kohlberg, Shyamnath Gollakota

TL;DR

This work tackles the feasibility of real-time, on-device speech enhancement for wireless hearables under stringent power and size constraints. It introduces NeuralAids, a five-board hardware platform integrating a low-power AI accelerator (GAP9) with BLE connectivity, and a custom dual-path TF-domain network designed for low-latency, high-quality denoising. Through mixed-precision quantization and quantization-aware training, the authors achieve real-time inference on 6 ms audio chunks with about 71.6 mW of power and memory footprints around a few hundred kilobytes, validated by end-to-end hardware runtimes under 6 ms and a user study showing improved speech quality and noise suppression over prior on-device models. The results demonstrate that advanced speech AI models can run directly on wireless hearables, enabling fully on-device enhanced hearing and enabling next-generation edge-enabled audio devices.

Abstract

The conventional wisdom has been that designing ultra-compact, battery-constrained wireless hearables with on-device speech AI models is challenging due to the high computational demands of streaming deep learning models. Speech AI models require continuous, real-time audio processing, imposing strict computational and I/O constraints. We present NeuralAids, a fully on-device speech AI system for wireless hearables, enabling real-time speech enhancement and denoising on compact, battery-constrained devices. Our system bridges the gap between state-of-the-art deep learning for speech enhancement and low-power AI hardware by making three key technical contributions: 1) a wireless hearable platform integrating a speech AI accelerator for efficient on-device streaming inference, 2) an optimized dual-path neural network designed for low-latency, high-quality speech enhancement, and 3) a hardware-software co-design that uses mixed-precision quantization and quantization-aware training to achieve real-time performance under strict power constraints. Our system processes 6 ms audio chunks in real-time, achieving an inference time of 5.54 ms while consuming 71.6 mW. In real-world evaluations, including a user study with 28 participants, our system outperforms prior on-device models in speech quality and noise suppression, paving the way for next-generation intelligent wireless hearables that can enhance hearing entirely on-device.

Wireless Hearables With Programmable Speech AI Accelerators

TL;DR

This work tackles the feasibility of real-time, on-device speech enhancement for wireless hearables under stringent power and size constraints. It introduces NeuralAids, a five-board hardware platform integrating a low-power AI accelerator (GAP9) with BLE connectivity, and a custom dual-path TF-domain network designed for low-latency, high-quality denoising. Through mixed-precision quantization and quantization-aware training, the authors achieve real-time inference on 6 ms audio chunks with about 71.6 mW of power and memory footprints around a few hundred kilobytes, validated by end-to-end hardware runtimes under 6 ms and a user study showing improved speech quality and noise suppression over prior on-device models. The results demonstrate that advanced speech AI models can run directly on wireless hearables, enabling fully on-device enhanced hearing and enabling next-generation edge-enabled audio devices.

Abstract

The conventional wisdom has been that designing ultra-compact, battery-constrained wireless hearables with on-device speech AI models is challenging due to the high computational demands of streaming deep learning models. Speech AI models require continuous, real-time audio processing, imposing strict computational and I/O constraints. We present NeuralAids, a fully on-device speech AI system for wireless hearables, enabling real-time speech enhancement and denoising on compact, battery-constrained devices. Our system bridges the gap between state-of-the-art deep learning for speech enhancement and low-power AI hardware by making three key technical contributions: 1) a wireless hearable platform integrating a speech AI accelerator for efficient on-device streaming inference, 2) an optimized dual-path neural network designed for low-latency, high-quality speech enhancement, and 3) a hardware-software co-design that uses mixed-precision quantization and quantization-aware training to achieve real-time performance under strict power constraints. Our system processes 6 ms audio chunks in real-time, achieving an inference time of 5.54 ms while consuming 71.6 mW. In real-world evaluations, including a user study with 28 participants, our system outperforms prior on-device models in speech quality and noise suppression, paving the way for next-generation intelligent wireless hearables that can enhance hearing entirely on-device.

Paper Structure

This paper contains 26 sections, 5 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: NeuralAid hardware. The device has five interconnected flexible and rigid circuit boards that together form an AI-enabled hearable. PWR (2-layer flexible PCB): manages power, charging, and programming, BT (2-layer rigid PCB): houses the BLE SoC, AI (6-layer rigid PCB): contains a low-power AI accelerator for real-time speech AI, PERIPH (4-layer rigid PCB): hosts peripherals including RAM, NOR flash, IMU, and I2S DAC, MIC (2-layer flexible PCB): has a microphone array with three mics and two push buttons.
  • Figure 3: Efficient streaming neural network. (A) Decomposition of the end-to-end latency in streaming speech enhancement. (B) The normal overlap-add operation introduces additional algorithmic latency with lookback padding. (C) The overall architecture of our streaming speech enhancement network with history caching states for computing reuse. (D) Dual-window approach during overlap-add can reduce the additional algorithmic latency introduced by the lookback padding.
  • Figure 4: QAT reduces the gap between our floating-point and quantized models across input noise levels.
  • Figure 5: End-to-end hardware run-time evaluation.
  • Figure 6: Wireless throughput from the NeuralAid device to a nearby receiver as a function of distance.
  • ...and 2 more figures