Multiplexing Neural Audio Watermarks
Zheqi Yuan, Yucheng Huang, Guangzhi Sun, Zengrui Jin, Chao Zhang
TL;DR
Experimental results on the LibriSpeech and Common Voice datasets under 14 diverse attack types demonstrate that both PA-TFM and MaskNet considerably outperform existing single-watermark baselines, establishing a resilient paradigm for real-world audio protection.
Abstract
Audio watermarking is essential for verifying speech authenticity, yet single-watermark schemes often struggle against sophisticated distortions such as neural reconstruction and adversarial attacks. To address this limitation, we introduce a multiplexing paradigm that combines multiple watermarking techniques to leverage their inherent complementarities. We explore both parallel and sequential multiplexing strategies and propose perceptual-adaptive time-frequency multiplexing (PA-TFM), a robust training-free approach. To further enhance performance, we introduce MaskNet, a novel model-based framework designed to learn effective time-domain multiplexing. Experimental results on the LibriSpeech and Common Voice datasets under 14 diverse attack types, including high-strength white-box and neural reconstruction attacks, demonstrate that both PA-TFM and MaskNet considerably outperform existing single-watermark baselines, establishing a resilient paradigm for real-world audio protection.
