Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation
Myeonghoon Ryu, Hongseok Oh, Suji Lee, Han Park
TL;DR
The paper tackles the problem of sound event classification under device variability caused by recording hardware. It proposes Unified Microphone Conversion, a FiLM-conditioned CycleGAN framework that achieves many-to-many device mappings using a single generator and multiple discriminators, augmented by a synthetic frequency response difference generator. Key contributions include the FiLM encoder that modulates feature statistics with device-specific embeddings, the integration of frequency-response information into the generator, and a scalable synthetic FR difference strategy that reduces data collection needs. Empirical results show improvements of $2.6\%$ in macro-average F1 and a $0.8\%$ reduction in variability compared to state-of-the-art, demonstrating scalable, robust SEC performance across diverse devices.
Abstract
We present Unified Microphone Conversion, a unified generative framework designed to bolster sound event classification (SEC) systems against device variability. While our prior CycleGAN-based methods effectively simulate device characteristics, they require separate models for each device pair, limiting scalability. Our approach overcomes this constraint by conditioning the generator on frequency response data, enabling many-to-many device mappings through unpaired training. We integrate frequency-response information via Feature-wise Linear Modulation, further enhancing scalability. Additionally, incorporating synthetic frequency response differences improves the applicability of our framework for real-world application. Experimental results show that our method outperforms the state-of-the-art by 2.6% and reduces variability by 0.8% in macro-average F1 score.
