Table of Contents
Fetching ...

UL-UNAS: Ultra-Lightweight U-Nets for Real-Time Speech Enhancement via Network Architecture Search

Xiaobin Rong, Leyan Yang, Dahan Wang, Yuxiang Hu, Changbao Zhu, Kai Chen, Jing Lu

TL;DR

UL-UNAS targets real-time speech enhancement on edge devices by combining an ultra-lightweight U-Net backbone with neural architecture search (NAS). It introduces two capacity-boosting components, affine PReLU (APReLU) and causal time-frequency attention (cTFA), and demonstrates that a MACS-aware NAS can yield architectures that outperform prior ultra-lightweight models while maintaining real-time, zero-look-ahead processing. Across large-scale DNS3 and VCTK-DEMAND benchmarks, UL-UNAS achieves competitive PESQ and DNSMOS with around 35M MACS and 171k parameters, surpassing heavier baselines in many metrics. The approach provides practical guidance for designing efficient SE models and comes with source code and audio demos for reproducibility and deployment potential.

Abstract

Lightweight models are essential for real-time speech enhancement applications. In recent years, there has been a growing trend toward developing increasingly compact models for speech enhancement. In this paper, we propose an Ultra-Lightweight U-net optimized by Network Architecture Search (UL-UNAS), which is suitable for implementation in low-footprint devices. Firstly, we explore the application of various efficient convolutional blocks within the U-Net framework to identify the most promising candidates. Secondly, we introduce two boosting components to enhance the capacity of these convolutional blocks: a novel activation function named affine PReLU and a causal time-frequency attention module. Furthermore, we leverage neural architecture search to discover an optimal architecture within our carefully designed search space. By integrating the above strategies, UL-UNAS not only significantly outperforms the latest ultra-lightweight models with the same or lower computational complexity, but also delivers competitive performance compared to recent baseline models that require substantially higher computational resources. Source code and audio demos are available at https://github.com/Xiaobin-Rong/ul-unas.

UL-UNAS: Ultra-Lightweight U-Nets for Real-Time Speech Enhancement via Network Architecture Search

TL;DR

UL-UNAS targets real-time speech enhancement on edge devices by combining an ultra-lightweight U-Net backbone with neural architecture search (NAS). It introduces two capacity-boosting components, affine PReLU (APReLU) and causal time-frequency attention (cTFA), and demonstrates that a MACS-aware NAS can yield architectures that outperform prior ultra-lightweight models while maintaining real-time, zero-look-ahead processing. Across large-scale DNS3 and VCTK-DEMAND benchmarks, UL-UNAS achieves competitive PESQ and DNSMOS with around 35M MACS and 171k parameters, surpassing heavier baselines in many metrics. The approach provides practical guidance for designing efficient SE models and comes with source code and audio demos for reproducibility and deployment potential.

Abstract

Lightweight models are essential for real-time speech enhancement applications. In recent years, there has been a growing trend toward developing increasingly compact models for speech enhancement. In this paper, we propose an Ultra-Lightweight U-net optimized by Network Architecture Search (UL-UNAS), which is suitable for implementation in low-footprint devices. Firstly, we explore the application of various efficient convolutional blocks within the U-Net framework to identify the most promising candidates. Secondly, we introduce two boosting components to enhance the capacity of these convolutional blocks: a novel activation function named affine PReLU and a causal time-frequency attention module. Furthermore, we leverage neural architecture search to discover an optimal architecture within our carefully designed search space. By integrating the above strategies, UL-UNAS not only significantly outperforms the latest ultra-lightweight models with the same or lower computational complexity, but also delivers competitive performance compared to recent baseline models that require substantially higher computational resources. Source code and audio demos are available at https://github.com/Xiaobin-Rong/ul-unas.

Paper Structure

This paper contains 31 sections, 11 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Overall architecture of UL-UNAS.
  • Figure 2: Efficient convolutional blocks modified for ultra-lightweight SE. (a) Conv block. (b) DWS block. (c) Ghost block. (d) Rep block. (e) MB block. (f) Star block. PW and DW refer to pointwise and depthwise, respectively, and Rep represents reparameterizable, with details omitted in this figure.
  • Figure 3: Extended convolutional blocks integrated with APReLU and cTFA. (a) XConv block. (b) XDWS block. (c) XMB block.
  • Figure 4: Examples of the proposed APReLU. (a) $\gamma=\beta=\alpha=0$. (b) $\gamma=0.3$, $\beta=0.5$, $\alpha=0.1$. (c) $\gamma=-0.3$, $\beta=-0.5$, $\alpha=-0.1$. (d) $\gamma=-1.3$, $\beta=-0.5$, $\alpha=0.1$.
  • Figure 5: Detailed architecture of the proposed cTFA module.
  • ...and 3 more figures