Table of Contents
Fetching ...

SUBARU: A Practical Approach to Power Saving in Hearables Using SUB-Nyquist Audio Resolution Upsampling

Tarikul Islam Tamiti, Sajid Fardin Dipto, Luke Benjamin Baja-Ricketts, David C Vergano, Anomadarshi Barua

TL;DR

SUBARU intentionally uses sub-Nyquist sampling and low bit resolution in ADCs, achieving a 3.31x reduction in power consumption and achieves streaming operations on mobile platforms and SE in in-the-wild noisy conditions with an inference time of 1.74ms.

Abstract

Hearables are wearable computers that are worn on the ear. Bone conduction microphones (BCMs) are used with air conduction microphones (ACMs) in hearables as a supporting modality for multimodal speech enhancement (SE) in noisy conditions. However, existing works don't consider the following practical aspects for low-power implementations on hearables: (i) They do not explore how lowering the sampling frequencies and bit resolutions in analog-to-digital converters (ADCs) of hearables jointly impact low-power processing and multimodal SE in terms of speech quality and intelligibility. And (iii) They don't process signals from ACMs/BCMs at a sub-Nyquist sampling rate because, in their frameworks, they lack a wideband reconstruction methodology from their narrowband parts. We propose SUBARU (\textbf{Sub}-Nyquist \textbf{A}udio \textbf{R}esolution \textbf{U}psampling), which achieves the following: SUBARU (i) intentionally uses sub-Nyquist sampling and low bit resolution in ADCs, achieving a 3.31x reduction in power consumption; and (ii) achieves streaming operations on mobile platforms and SE in in-the-wild noisy conditions with an inference time of 1.74ms and a memory footprint of less than 13.77MB.

SUBARU: A Practical Approach to Power Saving in Hearables Using SUB-Nyquist Audio Resolution Upsampling

TL;DR

SUBARU intentionally uses sub-Nyquist sampling and low bit resolution in ADCs, achieving a 3.31x reduction in power consumption and achieves streaming operations on mobile platforms and SE in in-the-wild noisy conditions with an inference time of 1.74ms.

Abstract

Hearables are wearable computers that are worn on the ear. Bone conduction microphones (BCMs) are used with air conduction microphones (ACMs) in hearables as a supporting modality for multimodal speech enhancement (SE) in noisy conditions. However, existing works don't consider the following practical aspects for low-power implementations on hearables: (i) They do not explore how lowering the sampling frequencies and bit resolutions in analog-to-digital converters (ADCs) of hearables jointly impact low-power processing and multimodal SE in terms of speech quality and intelligibility. And (iii) They don't process signals from ACMs/BCMs at a sub-Nyquist sampling rate because, in their frameworks, they lack a wideband reconstruction methodology from their narrowband parts. We propose SUBARU (\textbf{Sub}-Nyquist \textbf{A}udio \textbf{R}esolution \textbf{U}psampling), which achieves the following: SUBARU (i) intentionally uses sub-Nyquist sampling and low bit resolution in ADCs, achieving a 3.31x reduction in power consumption; and (ii) achieves streaming operations on mobile platforms and SE in in-the-wild noisy conditions with an inference time of 1.74ms and a memory footprint of less than 13.77MB.

Paper Structure

This paper contains 40 sections, 4 equations, 12 figures, 17 tables.

Figures (12)

  • Figure 1: We propose to use sub-Nyquist sampling and low bit resolution on hearables in split architecture where audio will be reconstructed with low latency and high fidelity on mobile platforms.
  • Figure 2: A 48 kHz sampled reference signal is down sampled at (a) 24 kHz, (b) 12 kHz, (c) 10 kHz, (d) 8 kHz, (e) 6 kHz, and (f) 4 kHz. (g) The LSD increases and (h) PESQ decreases from higher to lower sampling frequencies. (g and h) The 8-bit audio has less quality than 12-bit audio in terms of PESQ and LSD. The circle marks indicate that downsampling reduces signal quality.
  • Figure 3: Overview of the SUBARU architecture.
  • Figure 4: SEN enhances noisy spectrograms for waveform-based processing.
  • Figure 5: The upsampling network has a 256 upsampling ratio, which is done in 4 stages: 8x, 8x, 2x, and 2x upsampling.
  • ...and 7 more figures