Table of Contents
Fetching ...

Improving Speech Enhancement by Integrating Inter-Channel and Band Features with Dual-branch Conformer

Jizhen Li, Xinmeng Xu, Weiping Tu, Yuhong Yang, Rong Zhu

TL;DR

A novel dual-branch architecture named channel-aware dual-branch conformer (CADB-Conformer) is proposed, which effectively explores the long range time and frequency correlations among different channels, respectively, to extract channel relation aware time-frequency information.

Abstract

Recent speech enhancement methods based on convolutional neural networks (CNNs) and transformer have been demonstrated to efficaciously capture time-frequency (T-F) information on spectrogram. However, the correlation of each channels of speech features is failed to explore. Theoretically, each channel map of speech features obtained by different convolution kernels contains information with different scales demonstrating strong correlations. To fill this gap, we propose a novel dual-branch architecture named channel-aware dual-branch conformer (CADB-Conformer), which effectively explores the long range time and frequency correlations among different channels, respectively, to extract channel relation aware time-frequency information. Ablation studies conducted on DNS-Challenge 2020 dataset demonstrate the importance of channel feature leveraging while showing the significance of channel relation aware T-F information for speech enhancement. Extensive experiments also show that the proposed model achieves superior performance than recent methods with an attractive computational costs.

Improving Speech Enhancement by Integrating Inter-Channel and Band Features with Dual-branch Conformer

TL;DR

A novel dual-branch architecture named channel-aware dual-branch conformer (CADB-Conformer) is proposed, which effectively explores the long range time and frequency correlations among different channels, respectively, to extract channel relation aware time-frequency information.

Abstract

Recent speech enhancement methods based on convolutional neural networks (CNNs) and transformer have been demonstrated to efficaciously capture time-frequency (T-F) information on spectrogram. However, the correlation of each channels of speech features is failed to explore. Theoretically, each channel map of speech features obtained by different convolution kernels contains information with different scales demonstrating strong correlations. To fill this gap, we propose a novel dual-branch architecture named channel-aware dual-branch conformer (CADB-Conformer), which effectively explores the long range time and frequency correlations among different channels, respectively, to extract channel relation aware time-frequency information. Ablation studies conducted on DNS-Challenge 2020 dataset demonstrate the importance of channel feature leveraging while showing the significance of channel relation aware T-F information for speech enhancement. Extensive experiments also show that the proposed model achieves superior performance than recent methods with an attractive computational costs.
Paper Structure (14 sections, 4 equations, 3 figures, 2 tables)

This paper contains 14 sections, 4 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overall enhancement process of the proposed CADB-Conformer. (a) An overview of the proposed CADB-Conformer architecture. (b) The encoder architecture of CADB-Conformer. (c) Complex decoder unit. (d) Mask decoder unit.
  • Figure 2: The detail of proposed CADB-Conformer module, which contains Channel Feature Branch (CFB) and Band Feature Branch (BFB).
  • Figure 3: (a) The architecture of Channel Feature Branch. (b) The detail of Self-Channel Attention (Self-CA).