Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics
Xiaoshuai Wu, Xin Liao, Bo Ou, Yuling Liu, Zheng Qin
TL;DR
This work addresses the risk that traditional watermarking for provenance can impair Deepfake detectors. It introduces AdvMark, a two-stage approach that first establishes robust watermarking and then adversarially fine-tunes it against a surrogate detector, producing final watermark encoders/decoders that both enable provenance extraction and improve forensic detectability without tuning deployed detectors. Empirical results across multiple Deepfake types and detectors show strong white-box gains (near 100% accuracy) and transferable improvements to unseen detectors, with robust watermark extraction and acceptable visual quality. By treating adversarial watermarking as a helpful tool, the paper presents a practical, plug-and-play solution that harmonizes provenance tracking with proactive forensic effectiveness in real-world settings, while also outlining avenues for extending the approach to broader forensic tasks.
Abstract
AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake.
