Ultra Low Complexity Deep Learning Based Noise Suppression
Shrishti Saha Shetu, Soumitro Chakrabarty, Oliver Thiergart, Edwin Mabande
TL;DR
The paper tackles real-time speech enhancement on resource-constrained devices by reducing deep neural network computational load while maintaining performance. It introduces a two-stage framework in which Stage1 employs a convolutional recurrent network to estimate an intermediate real magnitude mask and Stage2 uses a lightweight CNN to refine phase and produce a complex mask, aided by channelwise feature reorientation and power-law compression to boost efficiency. The method achieves near-state-of-the-art noise suppression with a compact model (688K parameters), demonstrated by a real-time factor of about 0.127 and GMACS of roughly 0.098 on a single Cortex-A53 core at 16 kHz. The design relies on a compact intermediate magnitude estimator using two fully connected layers and a small second-stage CNN, enabling deployment on embedded devices without substantial loss in quality. Overall, the work offers a practical and scalable approach to resource-efficient SE with competitive performance.
Abstract
This paper introduces an innovative method for reducing the computational complexity of deep neural networks in real-time speech enhancement on resource-constrained devices. The proposed approach utilizes a two-stage processing framework, employing channelwise feature reorientation to reduce the computational load of convolutional operations. By combining this with a modified power law compression technique for enhanced perceptual quality, this approach achieves noise suppression performance comparable to state-of-the-art methods with significantly less computational requirements. Notably, our algorithm exhibits 3 to 4 times less computational complexity and memory usage than prior state-of-the-art approaches.
