Table of Contents
Fetching ...

Neural Directional Filtering: Far-Field Directivity Control With a Small Microphone Array

Julian Wechsler, Srikanth Raj Chetupalli, Mhd Modar Halimeh, Oliver Thiergart, Emanuël A. P. Habets

TL;DR

This work addresses controlling far-field directivity with a compact microphone array by learning a target pattern through neural directional filtering. A lightweight FT-JNF architecture predicts a single-channel complex mask from multi-channel inputs, applying it to a reference mic to realize the desired pattern $Z_{\textrm{VDM}}$ via $\widehat{Z}_{VDM}[f,t] = \mathcal{M}[f,t]\,Y_{1}[f,t]$. The study investigates how training data composition affects pattern realization and demonstrates that the method can closely approximate cardioid and higher-order DMA patterns using few microphones, outperforming traditional parametric baselines in most cases. Results indicate strong performance when trained on multi-speaker mixtures, with best mean SDRs of $26.2$ dB for cardioid and $18.4$ dB for the $3^{\textrm{rd}}$-order DMA, suggesting practical impact for flexible spatial audio capture and reproduction. Future work includes steerable/arbitrary patterns, near-field and reverberant scenarios, measured-data validation, and exploring VDM placement strategies.

Abstract

Capturing audio signals with specific directivity patterns is essential in speech communication. This study presents a deep neural network (DNN)-based approach to directional filtering, alleviating the need for explicit signal models. More specifically, our proposed method uses a DNN to estimate a single-channel complex mask from the signals of a microphone array. This mask is then applied to a reference microphone to render a signal that exhibits a desired directivity pattern. We investigate the training dataset composition and its effect on the directivity realized by the DNN during inference. Using a relatively small DNN, the proposed method is found to approximate the desired directivity pattern closely. Additionally, it allows for the realization of higher-order directivity patterns using a small number of microphones, which is a difficult task for linear and parametric directional filtering.

Neural Directional Filtering: Far-Field Directivity Control With a Small Microphone Array

TL;DR

This work addresses controlling far-field directivity with a compact microphone array by learning a target pattern through neural directional filtering. A lightweight FT-JNF architecture predicts a single-channel complex mask from multi-channel inputs, applying it to a reference mic to realize the desired pattern via . The study investigates how training data composition affects pattern realization and demonstrates that the method can closely approximate cardioid and higher-order DMA patterns using few microphones, outperforming traditional parametric baselines in most cases. Results indicate strong performance when trained on multi-speaker mixtures, with best mean SDRs of dB for cardioid and dB for the -order DMA, suggesting practical impact for flexible spatial audio capture and reproduction. Future work includes steerable/arbitrary patterns, near-field and reverberant scenarios, measured-data validation, and exploring VDM placement strategies.

Abstract

Capturing audio signals with specific directivity patterns is essential in speech communication. This study presents a deep neural network (DNN)-based approach to directional filtering, alleviating the need for explicit signal models. More specifically, our proposed method uses a DNN to estimate a single-channel complex mask from the signals of a microphone array. This mask is then applied to a reference microphone to render a signal that exhibits a desired directivity pattern. We investigate the training dataset composition and its effect on the directivity realized by the DNN during inference. Using a relatively small DNN, the proposed method is found to approximate the desired directivity pattern closely. Additionally, it allows for the realization of higher-order directivity patterns using a small number of microphones, which is a difficult task for linear and parametric directional filtering.
Paper Structure (14 sections, 6 equations, 3 figures, 2 tables)

This paper contains 14 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Logarithmic polar plots of the DMA patterns considered in this study.
  • Figure 2: Distributions of SDR values when testing the best model for (a) the cardioid target and (b) the $3^{\textrm{rd}}$-order DMA target. The models were tested with two concurrently active sources. Their DOA are given on the axes, SDR for missing DOA combinations were interpolated using cubic interpolation.
  • Figure 3: Polar plot of the two VDM targets and their corresponding DNN estimate. The gray area illustrates the standard deviation.