Deep learning-based filtering of cross-spectral matrices using generative adversarial networks
Christof Puhle
TL;DR
The paper tackles filtering cross-spectral matrices from microphone arrays to mitigate ambient noise, reflections, and directivity prior to acoustic-source localization. It introduces a complex-valued GAN-based framework comprising a generator and discriminator that learn transformations between input and target cross-spectral matrices, guided by a transformation map and a dedicated loss combining adversarial and transformation-consistency terms. Five transformation tasks derived from simulated sound-pressure data demonstrate high accuracy on test sets, with the auto-encoder baseline performing strongest and directivity-removal tasks proving most challenging. This preprocessing step has potential to enhance downstream SSL methods and standard beamforming techniques by providing cleaner, phase-preserving cross-spectral representations.
Abstract
In this paper, we present a deep-learning method to filter out effects such as ambient noise, reflections, or source directivity from microphone array data represented as cross-spectral matrices. Specifically, we focus on a generative adversarial network (GAN) architecture designed to transform fixed-size cross-spectral matrices. Theses models were trained using sound pressure simulations of varying complexity developed for this purpose. Based on the results from applying these methods in a hyperparameter optimization of an auto-encoding task, we trained the optimized model to perform five distinct transformation tasks derived from different complexities inherent in our sound pressure simulations.
