Deep learning based spatial aliasing reduction in beamforming for audio capture
Mateusz Guzik, Giulio Cengarle, Daniel Arteaga
TL;DR
Spatial aliasing limits beamforming performance in spaced microphone arrays by introducing direction-dependent distortions at higher frequencies. The authors propose a U-Net that predicts a multichannel de-aliasing filter $\mathbf{F}_{ft}$ to apply to virtual microphone signals, with a fixed decoder $\mathbf{D}$ to yield alias-free outputs; supervision uses an alias-free encoder $\mathbf{E}$ and the PHASEN loss to align the de-aliased output with the target. Two experimental setups—stereo via cardioid pairs and 2D FOA decoding—demonstrate substantial objective (C-Si-SNR) and subjective (MUSHRA) improvements over conventional beamforming, including robustness to fixed and varying microphone spacings. The results indicate cross-channel interactions help in some setups, while the framework remains adaptable to different spatial audio pipelines and decoders, with future extensions to reverberation and frequency-dependent polar patterns. Overall, the work demonstrates that deep learning can effectively mitigate spatial aliasing in beamforming, enabling higher-fidelity audio capture in multi-microphone configurations.
Abstract
Spatial aliasing affects spaced microphone arrays, causing directional ambiguity above certain frequencies, degrading spatial and spectral accuracy of beamformers. Given the limitations of conventional signal processing and the scarcity of deep learning approaches to spatial aliasing mitigation, we propose a novel approach using a U-Net architecture to predict a signal-dependent de-aliasing filter, which reduces aliasing in conventional beamforming for spatial capture. Two types of multichannel filters are considered, one which treats the channels independently and a second one that models cross-channel dependencies. The proposed approach is evaluated in two common spatial capture scenarios: stereo and first-order Ambisonics. The results indicate a very significant improvement, both objective and perceptual, with respect to conventional beamforming. This work shows the potential of deep learning to reduce aliasing in beamforming, leading to improvements in multi-microphone setups.
