Table of Contents
Fetching ...

Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks

Sneha Das, Tom Bäckström

TL;DR

The paper tackles quantization-noise challenges in ad-hoc WASNs, where low bitrates degrade speech quality. It proposes a Bayesian postfilter that explicitly models quantization as a bin-constrained process, yielding a truncated Gaussian posterior $P(X|Y)$ and a MMSE estimator $\hat{x}_{MC}=E[X|Y]$ that fuses signals from two devices. Key contributions include per-channel truncated posteriors, a joint posterior across devices, and a practical numerical integration scheme suitable for real-world codecs; evaluations using PSNR, PESQ, and MUSHRA show substantial gains over single-channel baselines and a diagonal MWF, with gains up to about $22.5$ dB in PSNR and $1.8$ MOS in PESQ under certain bitrates and SNRs. The results indicate that quantization-aware postfiltering is viable for improving audio quality in WASNs and is robust to reverberation, despite not modeling room effects explicitly. This work supports efficient, codec-friendly enhancement in multi-device sensor networks and informs bitrate allocation strategies across nodes.

Abstract

Enhancement algorithms for wireless acoustics sensor networks~(WASNs) are indispensable with the increasing availability and usage of connected devices with microphones. Conventional spatial filtering approaches for enhancement in WASNs approximate quantization noise with an additive Gaussian distribution, which limits performance due to the non-linear nature of quantization noise at lower bitrates. In this work, we propose a postfilter for enhancement based on Bayesian statistics to obtain a multidevice signal estimate, which explicitly models the quantization noise. Our experiments using PSNR, PESQ and MUSHRA scores demonstrate that the proposed postfilter can be used to enhance signal quality in ad-hoc sensor networks.

Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks

TL;DR

The paper tackles quantization-noise challenges in ad-hoc WASNs, where low bitrates degrade speech quality. It proposes a Bayesian postfilter that explicitly models quantization as a bin-constrained process, yielding a truncated Gaussian posterior and a MMSE estimator that fuses signals from two devices. Key contributions include per-channel truncated posteriors, a joint posterior across devices, and a practical numerical integration scheme suitable for real-world codecs; evaluations using PSNR, PESQ, and MUSHRA show substantial gains over single-channel baselines and a diagonal MWF, with gains up to about dB in PSNR and MOS in PESQ under certain bitrates and SNRs. The results indicate that quantization-aware postfiltering is viable for improving audio quality in WASNs and is robust to reverberation, despite not modeling room effects explicitly. This work supports efficient, codec-friendly enhancement in multi-device sensor networks and informs bitrate allocation strategies across nodes.

Abstract

Enhancement algorithms for wireless acoustics sensor networks~(WASNs) are indispensable with the increasing availability and usage of connected devices with microphones. Conventional spatial filtering approaches for enhancement in WASNs approximate quantization noise with an additive Gaussian distribution, which limits performance due to the non-linear nature of quantization noise at lower bitrates. In this work, we propose a postfilter for enhancement based on Bayesian statistics to obtain a multidevice signal estimate, which explicitly models the quantization noise. Our experiments using PSNR, PESQ and MUSHRA scores demonstrate that the proposed postfilter can be used to enhance signal quality in ad-hoc sensor networks.

Paper Structure

This paper contains 4 sections, 5 equations, 5 figures.

Figures (5)

  • Figure 1: Distribution of microphones in the ad-hoc acoustic sensor network.
  • Figure 2: Block diagrams showing (a) the overall system structure with the location of the postfilter, and (b) overview of the postfilter.
  • Figure 3: Illustration of differential PSNR and PESQ scores between the proposed multidevice estimate, and single-channel baseline and multichannel Wiener filter at $R=\{16,32kbps\}$ with 95% confidence intervals. $\rho_{(\text{MC}-\text{BL\_B})}$ and $\gamma_{(\text{MC}-\text{BL\_B})}$ are the differential PSNR and PESQ of the proposed multidevice estimate with respect to single-channel estimate of device B; $\rho_{(\text{MC}-\text{BL\_A})}$ and $\gamma_{(\text{MC}-\text{BL\_A})}$ are the differential scores of the multidevice estimate with respect to single-channel estimate of device A; $\rho_{(\text{MC}-\text{MWF})}$ and $\gamma_{(\text{MC}-\text{MWF})}$ are the differential scores of the multidevice estimate with respect to the multichannel Wiener filter.
  • Figure 4: Distribution of $\Delta$MUSHRA points from the subjective listening test. $\eta_{(\text{MC}-\text{BL\_B})}$ and $\eta_{(\text{MC}-\text{BL\_A})}$ are the differential MUSHRA of multidevice estimate with respect to signal-channel estimates at device B and device A, respectively. Mean-F and Mean-M are the average differential scores over the female and males items, respectively.
  • Figure 5: Contour plot showing the differential PESQ, $\gamma_{(\text{MC}-\text{BL\_B})}$ jointly over bitrates, input SNR and absorption coefficients.