Neural Directed Speech Enhancement with Dual Microphone Array in High Noise Scenario

Wen Wen; Qiang Zhou; Yu Xi; Haoyu Li; Ziqi Gong; Kai Yu

Neural Directed Speech Enhancement with Dual Microphone Array in High Noise Scenario

Wen Wen, Qiang Zhou, Yu Xi, Haoyu Li, Ziqi Gong, Kai Yu

TL;DR

A causal-directed U-Net (CDUNet) model is introduced, which takes raw multi-channel speech and the desired enhancement width as inputs and enables dynamic adjustment of steering vectors based on the target direction and fine-tuning of the enhancement region according to the angular separation between the target and interference signals.

Abstract

In multi-speaker scenarios, leveraging spatial features is essential for enhancing target speech. While with limited microphone arrays, developing a compact multi-channel speech enhancement system remains challenging, especially in extremely low signal-to-noise ratio (SNR) conditions. To tackle this issue, we propose a triple-steering spatial selection method, a flexible framework that uses three steering vectors to guide enhancement and determine the enhancement range. Specifically, we introduce a causal-directed U-Net (CDUNet) model, which takes raw multi-channel speech and the desired enhancement width as inputs. This enables dynamic adjustment of steering vectors based on the target direction and fine-tuning of the enhancement region according to the angular separation between the target and interference signals. Our model with only a dual microphone array, excels in both speech quality and downstream task performance. It operates in real-time with minimal parameters, making it ideal for low-latency, on-device streaming applications.

Neural Directed Speech Enhancement with Dual Microphone Array in High Noise Scenario

TL;DR

Abstract

Neural Directed Speech Enhancement with Dual Microphone Array in High Noise Scenario

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)