Table of Contents
Fetching ...

AudioMarkBench: Benchmarking Robustness of Audio Watermarking

Hongbin Liu, Moyang Guo, Zhengyuan Jiang, Lun Wang, Neil Zhenqiang Gong

TL;DR

This work presents AudioMarkBench, the first systematic benchmark for evaluating the robustness of audio watermarking against watermark removal and watermark forgery, and benchmark the robustness of these methods against the perturbations in no-box, black-box, and white-box settings.

Abstract

The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios. However, the robustness of audio watermarking against common/adversarial perturbations remains understudied. We present AudioMarkBench, the first systematic benchmark for evaluating the robustness of audio watermarking against watermark removal and watermark forgery. AudioMarkBench includes a new dataset created from Common-Voice across languages, biological sexes, and ages, 3 state-of-the-art watermarking methods, and 15 types of perturbations. We benchmark the robustness of these methods against the perturbations in no-box, black-box, and white-box settings. Our findings highlight the vulnerabilities of current watermarking techniques and emphasize the need for more robust and fair audio watermarking solutions. Our dataset and code are publicly available at https://github.com/moyangkuo/AudioMarkBench.

AudioMarkBench: Benchmarking Robustness of Audio Watermarking

TL;DR

This work presents AudioMarkBench, the first systematic benchmark for evaluating the robustness of audio watermarking against watermark removal and watermark forgery, and benchmark the robustness of these methods against the perturbations in no-box, black-box, and white-box settings.

Abstract

The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios. However, the robustness of audio watermarking against common/adversarial perturbations remains understudied. We present AudioMarkBench, the first systematic benchmark for evaluating the robustness of audio watermarking against watermark removal and watermark forgery. AudioMarkBench includes a new dataset created from Common-Voice across languages, biological sexes, and ages, 3 state-of-the-art watermarking methods, and 15 types of perturbations. We benchmark the robustness of these methods against the perturbations in no-box, black-box, and white-box settings. Our findings highlight the vulnerabilities of current watermarking techniques and emphasize the need for more robust and fair audio watermarking solutions. Our dataset and code are publicly available at https://github.com/moyangkuo/AudioMarkBench.
Paper Structure (24 sections, 3 equations, 31 figures, 11 tables, 4 algorithms)

This paper contains 24 sections, 3 equations, 31 figures, 11 tables, 4 algorithms.

Figures (31)

  • Figure 1: Summary of our AudioMarkBench.
  • Figure 2: Detection results under no perturbations on AudioMarkData. We set the detection threshold $\tau$ for each watermarking method as follows: AudioSeal $\tau=0.15$, AudioSeal-B $\tau=0.875$, WavMark $\tau=0.0$, and Timbre $\tau=0.8125$, to achieve $\text{FPR}<0.01$ and $\text{FNR}<0.01$. Results for LibriSpeech are in Figure \ref{['figure:no_attack_librispeech']} in Appendix.
  • Figure 3: Detection results under EnCodeC perturbations on both datasets (first row: AudioMarkData and second row: LibriSpeech). Results of the other eleven no-box perturbations are in Appendix \ref{['appendix:no-box perturbation']}.
  • Figure 4: FNRs in biological sexes against watermark-removal (a) Gaussian noise perturbations, (b) Square attack perturbations, and (c) white-box perturbations. (d) FPRs in biological sexes against watermark-forgery EnCodeC perturbations. The watermarking method is AudioSeal. The gaps between "female" and "male" are statistically significant in two-tailed t-test with $p$-value < $\alpha=0.05$.
  • Figure 5: Language difference against watermark-removal Gaussian noise perturbations with SNR 20. The watermarking method is AudioSeal.
  • ...and 26 more figures