FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement
Xiang Hao, Xiangdong Su, Radu Horaud, Xiaofei Li
TL;DR
FullSubNet addresses real-time single-channel speech enhancement by unifying a full-band model that captures global spectral context with a sub-band model that leverages signal stationarity and local spectral patterns. The two components are connected sequentially and trained jointly using the complex Ideal Ratio Mask as the target, enabling complementary information to be exploited across frequency and time. Evaluated on the DNS Challenge 2020 dataset, the approach outperforms both the individual full-band or sub-band baselines and several state-of-the-art methods, while maintaining real-time processing on CPU with a modest latency budget. The work demonstrates that integrating global and local spectral cues yields tangible gains in denoising and dereverberation, suggesting practical benefits for real-time speech enhancement systems.
Abstract
This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for single-channel real-time speech enhancement. Full-band and sub-band refer to the models that input full-band and sub-band noisy spectral feature, output full-band and sub-band speech target, respectively. The sub-band model processes each frequency independently. Its input consists of one frequency and several context frequencies. The output is the prediction of the clean speech target for the corresponding frequency. These two types of models have distinct characteristics. The full-band model can capture the global spectral context and the long-distance cross-band dependencies. However, it lacks the ability to modeling signal stationarity and attending the local spectral pattern. The sub-band model is just the opposite. In our proposed FullSubNet, we connect a pure full-band model and a pure sub-band model sequentially and use practical joint training to integrate these two types of models' advantages. We conducted experiments on the DNS challenge (INTERSPEECH 2020) dataset to evaluate the proposed method. Experimental results show that full-band and sub-band information are complementary, and the FullSubNet can effectively integrate them. Besides, the performance of the FullSubNet also exceeds that of the top-ranked methods in the DNS Challenge (INTERSPEECH 2020).
