UL-UNAS: Ultra-Lightweight U-Nets for Real-Time Speech Enhancement via Network Architecture Search
Xiaobin Rong, Leyan Yang, Dahan Wang, Yuxiang Hu, Changbao Zhu, Kai Chen, Jing Lu
TL;DR
UL-UNAS targets real-time speech enhancement on edge devices by combining an ultra-lightweight U-Net backbone with neural architecture search (NAS). It introduces two capacity-boosting components, affine PReLU (APReLU) and causal time-frequency attention (cTFA), and demonstrates that a MACS-aware NAS can yield architectures that outperform prior ultra-lightweight models while maintaining real-time, zero-look-ahead processing. Across large-scale DNS3 and VCTK-DEMAND benchmarks, UL-UNAS achieves competitive PESQ and DNSMOS with around 35M MACS and 171k parameters, surpassing heavier baselines in many metrics. The approach provides practical guidance for designing efficient SE models and comes with source code and audio demos for reproducibility and deployment potential.
Abstract
Lightweight models are essential for real-time speech enhancement applications. In recent years, there has been a growing trend toward developing increasingly compact models for speech enhancement. In this paper, we propose an Ultra-Lightweight U-net optimized by Network Architecture Search (UL-UNAS), which is suitable for implementation in low-footprint devices. Firstly, we explore the application of various efficient convolutional blocks within the U-Net framework to identify the most promising candidates. Secondly, we introduce two boosting components to enhance the capacity of these convolutional blocks: a novel activation function named affine PReLU and a causal time-frequency attention module. Furthermore, we leverage neural architecture search to discover an optimal architecture within our carefully designed search space. By integrating the above strategies, UL-UNAS not only significantly outperforms the latest ultra-lightweight models with the same or lower computational complexity, but also delivers competitive performance compared to recent baseline models that require substantially higher computational resources. Source code and audio demos are available at https://github.com/Xiaobin-Rong/ul-unas.
