Table of Contents
Fetching ...

Yours or Mine? Overwriting Attacks Against Neural Audio Watermarking

Lingfeng Yao, Chenpei Huang, Shengyao Wang, Junpei Xue, Hanqing Guo, Jiang Liu, Phone Lin, Tomoaki Ohtsuki, Miao Pan

TL;DR

This paper reveals a fundamental security vulnerability in neural audio watermarking: overwriting attacks can effectively replace an original watermark with a forged one across white-box, gray-box, and black-box settings. It develops three threat-model-based attack designs, including a surrogate-training framework for gray-box scenarios and both zero- and query-based strategies for black-box cases, and demonstrates near-100% attack success across three state-of-the-art watermarking methods. The results show that ownership verification can be hijacked even when security relies on obscurity or weights secrecy, underscoring the need to integrate explicit security mechanisms into watermarking designs. Overall, the work shifts the focus of neural audio watermarking from robustness and imperceptibility to proactive defense against adversarial overwriting, with implications for copyright protection and provenance verification in AI-generated audio.

Abstract

As generative audio models are rapidly evolving, AI-generated audios increasingly raise concerns about copyright infringement and misinformation spread. Audio watermarking, as a proactive defense, can embed secret messages into audio for copyright protection and source verification. However, current neural audio watermarking methods focus primarily on the imperceptibility and robustness of watermarking, while ignoring its vulnerability to security attacks. In this paper, we develop a simple yet powerful attack: the overwriting attack that overwrites the legitimate audio watermark with a forged one and makes the original legitimate watermark undetectable. Based on the audio watermarking information that the adversary has, we propose three categories of overwriting attacks, i.e., white-box, gray-box, and black-box attacks. We also thoroughly evaluate the proposed attacks on state-of-the-art neural audio watermarking methods. Experimental results demonstrate that the proposed overwriting attacks can effectively compromise existing watermarking schemes across various settings and achieve a nearly 100% attack success rate. The practicality and effectiveness of the proposed overwriting attacks expose security flaws in existing neural audio watermarking systems, underscoring the need to enhance security in future audio watermarking designs.

Yours or Mine? Overwriting Attacks Against Neural Audio Watermarking

TL;DR

This paper reveals a fundamental security vulnerability in neural audio watermarking: overwriting attacks can effectively replace an original watermark with a forged one across white-box, gray-box, and black-box settings. It develops three threat-model-based attack designs, including a surrogate-training framework for gray-box scenarios and both zero- and query-based strategies for black-box cases, and demonstrates near-100% attack success across three state-of-the-art watermarking methods. The results show that ownership verification can be hijacked even when security relies on obscurity or weights secrecy, underscoring the need to integrate explicit security mechanisms into watermarking designs. Overall, the work shifts the focus of neural audio watermarking from robustness and imperceptibility to proactive defense against adversarial overwriting, with implications for copyright protection and provenance verification in AI-generated audio.

Abstract

As generative audio models are rapidly evolving, AI-generated audios increasingly raise concerns about copyright infringement and misinformation spread. Audio watermarking, as a proactive defense, can embed secret messages into audio for copyright protection and source verification. However, current neural audio watermarking methods focus primarily on the imperceptibility and robustness of watermarking, while ignoring its vulnerability to security attacks. In this paper, we develop a simple yet powerful attack: the overwriting attack that overwrites the legitimate audio watermark with a forged one and makes the original legitimate watermark undetectable. Based on the audio watermarking information that the adversary has, we propose three categories of overwriting attacks, i.e., white-box, gray-box, and black-box attacks. We also thoroughly evaluate the proposed attacks on state-of-the-art neural audio watermarking methods. Experimental results demonstrate that the proposed overwriting attacks can effectively compromise existing watermarking schemes across various settings and achieve a nearly 100% attack success rate. The practicality and effectiveness of the proposed overwriting attacks expose security flaws in existing neural audio watermarking systems, underscoring the need to enhance security in future audio watermarking designs.

Paper Structure

This paper contains 27 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Overview of the proposed audio watermark overwriting attack. An adversary injects a new forged watermark into an already watermarked audio, erasing the original legitimate watermark. Thus, legitimate ownership cannot be verified, and the adversary can falsely claim the copyright.
  • Figure 2: Bit error rate (%) of the original watermark after white-box overwriting.
  • Figure 3: Bit error rate (%) distributions of the original watermark for AudioSeal and Timbre under gray-box settings.
  • Figure 4: Spectrogram comparison of the unwatermarked audio and three watermarked versions. Red boxes highlight the spectral perturbations introduced by the watermark.
  • Figure 5: Black-box reproduction attacks on three watermarking systems.
  • ...and 1 more figures