Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks

Sneha Das; Tom Bäckström

Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks

Sneha Das, Tom Bäckström

TL;DR

The paper tackles quantization-noise challenges in ad-hoc WASNs, where low bitrates degrade speech quality. It proposes a Bayesian postfilter that explicitly models quantization as a bin-constrained process, yielding a truncated Gaussian posterior $P(X|Y)$ and a MMSE estimator $\hat{x}_{MC}=E[X|Y]$ that fuses signals from two devices. Key contributions include per-channel truncated posteriors, a joint posterior across devices, and a practical numerical integration scheme suitable for real-world codecs; evaluations using PSNR, PESQ, and MUSHRA show substantial gains over single-channel baselines and a diagonal MWF, with gains up to about $22.5$ dB in PSNR and $1.8$ MOS in PESQ under certain bitrates and SNRs. The results indicate that quantization-aware postfiltering is viable for improving audio quality in WASNs and is robust to reverberation, despite not modeling room effects explicitly. This work supports efficient, codec-friendly enhancement in multi-device sensor networks and informs bitrate allocation strategies across nodes.

Abstract

Enhancement algorithms for wireless acoustics sensor networks~(WASNs) are indispensable with the increasing availability and usage of connected devices with microphones. Conventional spatial filtering approaches for enhancement in WASNs approximate quantization noise with an additive Gaussian distribution, which limits performance due to the non-linear nature of quantization noise at lower bitrates. In this work, we propose a postfilter for enhancement based on Bayesian statistics to obtain a multidevice signal estimate, which explicitly models the quantization noise. Our experiments using PSNR, PESQ and MUSHRA scores demonstrate that the proposed postfilter can be used to enhance signal quality in ad-hoc sensor networks.

Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks

TL;DR

and a MMSE estimator

that fuses signals from two devices. Key contributions include per-channel truncated posteriors, a joint posterior across devices, and a practical numerical integration scheme suitable for real-world codecs; evaluations using PSNR, PESQ, and MUSHRA show substantial gains over single-channel baselines and a diagonal MWF, with gains up to about

dB in PSNR and

MOS in PESQ under certain bitrates and SNRs. The results indicate that quantization-aware postfiltering is viable for improving audio quality in WASNs and is robust to reverberation, despite not modeling room effects explicitly. This work supports efficient, codec-friendly enhancement in multi-device sensor networks and informs bitrate allocation strategies across nodes.

Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks

TL;DR

Abstract

Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)