Single-Channel Distance-Based Source Separation for Mobile GPU in Outdoor and Indoor Environments
Hanbin Bae, Byungjun Kang, Jiwon Kim, Jaeyong Hwang, Hosang Sung, Hoon-Young Cho
TL;DR
The paper tackles single-channel distance-based source separation (DSS) in outdoor and indoor environments and proposes a mobile-friendly architecture that leverages TS-Conformer blocks, linear relation-aware self-attention (RSA), and the TensorFlow Lite GPU delegate to achieve energy-efficient, real-time inference. The signal model partitions the mixture into near and far sources using impulse responses simulated by Pyroomacoustics, and the method is trained on mixed outdoor–indoor data, including challenging outdoor noise. A Baseline CMGAN is extended with a linear RSA to reduce quadratic complexity from $O(N^2 d)$ to $O(N d^2)$ while maintaining separation quality, and mobile-GPU optimizations enable practical on-device deployment. Experiments on simulated and real outdoor data demonstrate substantial energy and speed gains on mobile hardware, with outdoor training yielding improved performance over indoor-only training.
Abstract
This study emphasizes the significance of exploring distance-based source separation (DSS) in outdoor environments. Unlike existing studies that primarily focus on indoor settings, the proposed model is designed to capture the unique characteristics of outdoor audio sources. It incorporates advanced techniques, including a two-stage conformer block, a linear relation-aware self-attention (RSA), and a TensorFlow Lite GPU delegate. While the linear RSA may not capture physical cues as explicitly as the quadratic RSA, the linear RSA enhances the model's context awareness, leading to improved performance on the DSS that requires an understanding of physical cues in outdoor and indoor environments. The experimental results demonstrated that the proposed model overcomes the limitations of existing approaches and considerably enhances energy efficiency and real-time inference speed on mobile devices.
