Table of Contents
Fetching ...

Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement

Longbiao Cheng, Ashutosh Pandey, Buye Xu, Tobi Delbruck, Shih-Chii Liu

TL;DR

Test results obtained from several state-of-the-art compute-efficient RNN-based speech enhancement architectures using the DNS challenge dataset, show that the D-GRU based model variants maintain similar speech intelligibility and quality metrics comparable to the baseline GRU based models even with an average 50% reduction in GRU computes.

Abstract

This paper introduces a new Dynamic Gated Recurrent Neural Network (DG-RNN) for compute-efficient speech enhancement models running on resource-constrained hardware platforms. It leverages the slow evolution characteristic of RNN hidden states over steps, and updates only a selected set of neurons at each step by adding a newly proposed select gate to the RNN model. This select gate allows the computation cost of the conventional RNN to be reduced during network inference. As a realization of the DG-RNN, we further propose the Dynamic Gated Recurrent Unit (D-GRU) which does not require additional parameters. Test results obtained from several state-of-the-art compute-efficient RNN-based speech enhancement architectures using the DNS challenge dataset, show that the D-GRU based model variants maintain similar speech intelligibility and quality metrics comparable to the baseline GRU based models even with an average 50% reduction in GRU computes.

Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement

TL;DR

Test results obtained from several state-of-the-art compute-efficient RNN-based speech enhancement architectures using the DNS challenge dataset, show that the D-GRU based model variants maintain similar speech intelligibility and quality metrics comparable to the baseline GRU based models even with an average 50% reduction in GRU computes.

Abstract

This paper introduces a new Dynamic Gated Recurrent Neural Network (DG-RNN) for compute-efficient speech enhancement models running on resource-constrained hardware platforms. It leverages the slow evolution characteristic of RNN hidden states over steps, and updates only a selected set of neurons at each step by adding a newly proposed select gate to the RNN model. This select gate allows the computation cost of the conventional RNN to be reduced during network inference. As a realization of the DG-RNN, we further propose the Dynamic Gated Recurrent Unit (D-GRU) which does not require additional parameters. Test results obtained from several state-of-the-art compute-efficient RNN-based speech enhancement architectures using the DNS challenge dataset, show that the D-GRU based model variants maintain similar speech intelligibility and quality metrics comparable to the baseline GRU based models even with an average 50% reduction in GRU computes.
Paper Structure (13 sections, 7 equations, 2 figures, 2 tables)

This paper contains 13 sections, 7 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Illustration of the update processes of (A) conventional Recurrent Neural Network (RNN) and (B) Dynamic Gated Recurrent Neural Network (DG-RNN) at step $t$. (A) For conventional RNN, all neurons in the hidden state are updated at each step. (B) DG-RNN first identifies which neurons need updating, as indicated by a $1$ in the proposed select gate $\boldsymbol{g}_t$. When a neuron is marked with $1$, it undergoes the RNN update process. Those marked with $0$ retain their values from the previous hidden state.
  • Figure 2: Mean PESQ improvement ($\Delta_{PESQ}$) of different update percentages $\mathcal{P}$ on different models under {-5, 0, 5} dB SNRs.