Table of Contents
Fetching ...

Deep learning-based filtering of cross-spectral matrices using generative adversarial networks

Christof Puhle

TL;DR

The paper tackles filtering cross-spectral matrices from microphone arrays to mitigate ambient noise, reflections, and directivity prior to acoustic-source localization. It introduces a complex-valued GAN-based framework comprising a generator and discriminator that learn transformations between input and target cross-spectral matrices, guided by a transformation map and a dedicated loss combining adversarial and transformation-consistency terms. Five transformation tasks derived from simulated sound-pressure data demonstrate high accuracy on test sets, with the auto-encoder baseline performing strongest and directivity-removal tasks proving most challenging. This preprocessing step has potential to enhance downstream SSL methods and standard beamforming techniques by providing cleaner, phase-preserving cross-spectral representations.

Abstract

In this paper, we present a deep-learning method to filter out effects such as ambient noise, reflections, or source directivity from microphone array data represented as cross-spectral matrices. Specifically, we focus on a generative adversarial network (GAN) architecture designed to transform fixed-size cross-spectral matrices. Theses models were trained using sound pressure simulations of varying complexity developed for this purpose. Based on the results from applying these methods in a hyperparameter optimization of an auto-encoding task, we trained the optimized model to perform five distinct transformation tasks derived from different complexities inherent in our sound pressure simulations.

Deep learning-based filtering of cross-spectral matrices using generative adversarial networks

TL;DR

The paper tackles filtering cross-spectral matrices from microphone arrays to mitigate ambient noise, reflections, and directivity prior to acoustic-source localization. It introduces a complex-valued GAN-based framework comprising a generator and discriminator that learn transformations between input and target cross-spectral matrices, guided by a transformation map and a dedicated loss combining adversarial and transformation-consistency terms. Five transformation tasks derived from simulated sound-pressure data demonstrate high accuracy on test sets, with the auto-encoder baseline performing strongest and directivity-removal tasks proving most challenging. This preprocessing step has potential to enhance downstream SSL methods and standard beamforming techniques by providing cleaner, phase-preserving cross-spectral representations.

Abstract

In this paper, we present a deep-learning method to filter out effects such as ambient noise, reflections, or source directivity from microphone array data represented as cross-spectral matrices. Specifically, we focus on a generative adversarial network (GAN) architecture designed to transform fixed-size cross-spectral matrices. Theses models were trained using sound pressure simulations of varying complexity developed for this purpose. Based on the results from applying these methods in a hyperparameter optimization of an auto-encoding task, we trained the optimized model to perform five distinct transformation tasks derived from different complexities inherent in our sound pressure simulations.

Paper Structure

This paper contains 13 sections, 36 equations, 5 figures.

Figures (5)

  • Figure 1: Accuracy scatter plot for transformation task 1) (auto-encoder)
  • Figure 2: Accuracy scatter plot for transformation task 2) (ambient sound)
  • Figure 3: Accuracy scatter plot for transformation task 3) (reflections)
  • Figure 4: Accuracy scatter plot for transformation task 4) (directivity)
  • Figure 5: Accuracy scatter plot for transformation task 5) (directivity, reflections and ambient sound)