HarmonicAttack: An Adaptive Cross-Domain Audio Watermark Removal
Kexin Li, Xiao Hu, Ilya Grishchenko, David Lie
TL;DR
HarmonicAttack targets the growing risk of AI-generated audio by evaluating watermark robustness through a closed-box, learning-based watermark removal method. It employs a dual-path autoencoder with GAN-style training and a psychoacoustic-aware, multi-component loss to remove watermarks while preserving audio quality, achieving near real-time performance and strong cross-domain transfer. The approach demonstrates superior ASR and perceptual quality compared to baselines across speech and music, highlighting vulnerabilities in current watermarking schemes. The work motivates developing watermarking defenses that are robust to adaptive, cross-domain attacks and informs practical considerations for deployment and policy.
Abstract
The availability of high-quality, AI-generated audio raises security challenges such as misinformation campaigns and voice-cloning fraud. A key defense against the misuse of AI-generated audio is by watermarking it, so that it can be easily distinguished from genuine audio. As those seeking to misuse AI-generated audio may thus seek to remove audio watermarks, studying effective watermark removal techniques is critical to being able to objectively evaluate the robustness of audio watermarks against removal. Previous watermark removal schemes either assume impractical knowledge of the watermarks they are designed to remove or are computationally expensive, potentially generating a false sense of confidence in current watermark schemes. We introduce HarmonicAttack, an efficient audio watermark removal method that only requires the basic ability to generate the watermarks from the targeted scheme and nothing else. With this, we are able to train a general watermark removal model that is able to remove the watermarks generated by the targeted scheme from any watermarked audio sample. HarmonicAttack employs a dual-path convolutional autoencoder that operates in both temporal and frequency domains, along with GAN-style training, to separate the watermark from the original audio. When evaluated against state-of-the-art watermark schemes AudioSeal, WavMark, and Silentcipher, HarmonicAttack demonstrates greater watermark removal ability than previous watermark removal methods with near real-time performance. Moreover, while HarmonicAttack requires training, we find that it is able to transfer to out-of-distribution samples with minimal degradation in performance.
