Monaural speech enhancement on drone via Adapter based transfer learning

Xingyu Chen; Hanwen Bi; Wei-Ting Lai; Fei Ma

Monaural speech enhancement on drone via Adapter based transfer learning

Xingyu Chen, Hanwen Bi, Wei-Ting Lai, Fei Ma

TL;DR

A frequency domain bottleneck adapter that is a more efficient alternative to fine-tuning models for various drone types, which requires substantial computational resources, and can effectively enhance speech quality is proposed.

Abstract

Monaural Speech enhancement on drones is challenging because the ego-noise from the rotating motors and propellers leads to extremely low signal-to-noise ratios at onboard microphones. Although recent masking-based deep neural network methods excel in monaural speech enhancement, they struggle in the challenging drone noise scenario. Furthermore, existing drone noise datasets are limited, causing models to overfit. Considering the harmonic nature of drone noise, this paper proposes a frequency domain bottleneck adapter to enable transfer learning. Specifically, the adapter's parameters are trained on drone noise while retaining the parameters of the pre-trained Frequency Recurrent Convolutional Recurrent Network (FRCRN) fixed. Evaluation results demonstrate the proposed method can effectively enhance speech quality. Moreover, it is a more efficient alternative to fine-tuning models for various drone types, which typically requires substantial computational resources.

Monaural speech enhancement on drone via Adapter based transfer learning

TL;DR

Abstract

Paper Structure (10 sections, 7 equations, 3 figures, 2 tables)

This paper contains 10 sections, 7 equations, 3 figures, 2 tables.

Introduction
PROBLEM FORMULATION
methodology
Transfer learning with FRCRN
Adapter tuning on FRCRN
Loss function
EXPERIMENTS
Dataset
Evaluation
conclusions

Figures (3)

Figure 1: Problem setup: (a) illustration of the monaural speech recording scenario over a flying drone (b) drone ego noise in the frequency domain (c) noisy speech in the time domain (d) noisy speech in the time-frequency domain.
Figure 2: The overview of Adapter pipeline. (a) Pre-trained FRCRN with adapter embedded; (b) Adapter; (c) $\mathrm{Adapter}_r$.
Figure 3: Results comparison: (a) clean speech (b) noisy speech (c) adapter tuning enhanced speech (d) noisy speech and adapter tuning enhanced speech in time-domain.

Monaural speech enhancement on drone via Adapter based transfer learning

TL;DR

Abstract

Monaural speech enhancement on drone via Adapter based transfer learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)