Table of Contents
Fetching ...

Multiplexing Neural Audio Watermarks

Zheqi Yuan, Yucheng Huang, Guangzhi Sun, Zengrui Jin, Chao Zhang

TL;DR

Experimental results on the LibriSpeech and Common Voice datasets under 14 diverse attack types demonstrate that both PA-TFM and MaskNet considerably outperform existing single-watermark baselines, establishing a resilient paradigm for real-world audio protection.

Abstract

Audio watermarking is essential for verifying speech authenticity, yet single-watermark schemes often struggle against sophisticated distortions such as neural reconstruction and adversarial attacks. To address this limitation, we introduce a multiplexing paradigm that combines multiple watermarking techniques to leverage their inherent complementarities. We explore both parallel and sequential multiplexing strategies and propose perceptual-adaptive time-frequency multiplexing (PA-TFM), a robust training-free approach. To further enhance performance, we introduce MaskNet, a novel model-based framework designed to learn effective time-domain multiplexing. Experimental results on the LibriSpeech and Common Voice datasets under 14 diverse attack types, including high-strength white-box and neural reconstruction attacks, demonstrate that both PA-TFM and MaskNet considerably outperform existing single-watermark baselines, establishing a resilient paradigm for real-world audio protection.

Multiplexing Neural Audio Watermarks

TL;DR

Experimental results on the LibriSpeech and Common Voice datasets under 14 diverse attack types demonstrate that both PA-TFM and MaskNet considerably outperform existing single-watermark baselines, establishing a resilient paradigm for real-world audio protection.

Abstract

Audio watermarking is essential for verifying speech authenticity, yet single-watermark schemes often struggle against sophisticated distortions such as neural reconstruction and adversarial attacks. To address this limitation, we introduce a multiplexing paradigm that combines multiple watermarking techniques to leverage their inherent complementarities. We explore both parallel and sequential multiplexing strategies and propose perceptual-adaptive time-frequency multiplexing (PA-TFM), a robust training-free approach. To further enhance performance, we introduce MaskNet, a novel model-based framework designed to learn effective time-domain multiplexing. Experimental results on the LibriSpeech and Common Voice datasets under 14 diverse attack types, including high-strength white-box and neural reconstruction attacks, demonstrate that both PA-TFM and MaskNet considerably outperform existing single-watermark baselines, establishing a resilient paradigm for real-world audio protection.

Paper Structure

This paper contains 14 sections, 6 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of the proposed multiplexing architectures: (a) PA-TFM using heuristic time-frequency masking; (b) MaskNet uses learned time-domain masking. MaskNet optimises the mask distribution via a differentiable attack loop while keeping the pre-trained watermark extractors frozen.
  • Figure 2: TPR curves for watermark A and P under different attack SNR strengths. (a) Gaussian noise, where the watermark P degrades more slowly; (b) Room impulse response, where watermark A degrades more slowly. These complementary effects illustrate the benefit of combining both watermarks.