Neural Directional Filtering: Far-Field Directivity Control With a Small Microphone Array
Julian Wechsler, Srikanth Raj Chetupalli, Mhd Modar Halimeh, Oliver Thiergart, Emanuël A. P. Habets
TL;DR
This work addresses controlling far-field directivity with a compact microphone array by learning a target pattern through neural directional filtering. A lightweight FT-JNF architecture predicts a single-channel complex mask from multi-channel inputs, applying it to a reference mic to realize the desired pattern $Z_{\textrm{VDM}}$ via $\widehat{Z}_{VDM}[f,t] = \mathcal{M}[f,t]\,Y_{1}[f,t]$. The study investigates how training data composition affects pattern realization and demonstrates that the method can closely approximate cardioid and higher-order DMA patterns using few microphones, outperforming traditional parametric baselines in most cases. Results indicate strong performance when trained on multi-speaker mixtures, with best mean SDRs of $26.2$ dB for cardioid and $18.4$ dB for the $3^{\textrm{rd}}$-order DMA, suggesting practical impact for flexible spatial audio capture and reproduction. Future work includes steerable/arbitrary patterns, near-field and reverberant scenarios, measured-data validation, and exploring VDM placement strategies.
Abstract
Capturing audio signals with specific directivity patterns is essential in speech communication. This study presents a deep neural network (DNN)-based approach to directional filtering, alleviating the need for explicit signal models. More specifically, our proposed method uses a DNN to estimate a single-channel complex mask from the signals of a microphone array. This mask is then applied to a reference microphone to render a signal that exhibits a desired directivity pattern. We investigate the training dataset composition and its effect on the directivity realized by the DNN during inference. Using a relatively small DNN, the proposed method is found to approximate the desired directivity pattern closely. Additionally, it allows for the realization of higher-order directivity patterns using a small number of microphones, which is a difficult task for linear and parametric directional filtering.
