Table of Contents
Fetching ...

Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics

Xiaoshuai Wu, Xin Liao, Bo Ou, Yuling Liu, Zheng Qin

TL;DR

This work addresses the risk that traditional watermarking for provenance can impair Deepfake detectors. It introduces AdvMark, a two-stage approach that first establishes robust watermarking and then adversarially fine-tunes it against a surrogate detector, producing final watermark encoders/decoders that both enable provenance extraction and improve forensic detectability without tuning deployed detectors. Empirical results across multiple Deepfake types and detectors show strong white-box gains (near 100% accuracy) and transferable improvements to unseen detectors, with robust watermark extraction and acceptable visual quality. By treating adversarial watermarking as a helpful tool, the paper presents a practical, plug-and-play solution that harmonizes provenance tracking with proactive forensic effectiveness in real-world settings, while also outlining avenues for extending the approach to broader forensic tasks.

Abstract

AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake.

Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics

TL;DR

This work addresses the risk that traditional watermarking for provenance can impair Deepfake detectors. It introduces AdvMark, a two-stage approach that first establishes robust watermarking and then adversarially fine-tunes it against a surrogate detector, producing final watermark encoders/decoders that both enable provenance extraction and improve forensic detectability without tuning deployed detectors. Empirical results across multiple Deepfake types and detectors show strong white-box gains (near 100% accuracy) and transferable improvements to unseen detectors, with robust watermark extraction and acceptable visual quality. By treating adversarial watermarking as a helpful tool, the paper presents a practical, plug-and-play solution that harmonizes provenance tracking with proactive forensic effectiveness in real-world settings, while also outlining avenues for extending the approach to broader forensic tasks.

Abstract

AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake.
Paper Structure (14 sections, 13 equations, 4 figures, 3 tables)

This paper contains 14 sections, 13 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Distinctions between the proposed AdvMark and current watermarking models. (a) Non-watermarked images contributed the baseline of detection performance, where the genuine and forged images may not be easily distinguished. (b) Current watermarking unintentionally degrades the detection performance, since the watermarks are prone to overlap with the forgery signals. (c) Our AdvMark leverages the watermarks to fool Deepfake detectors intentionally, which helps to distinguish between watermarked genuine and forged images without compromising provenance tracking.
  • Figure 2: The sketch of adversarial perturbations vs. adversarial watermarks. Left: The objective of adversarial perturbations is to make correctly predicted inputs result in wrong detection outcomes. Right: The objective of adversarial watermarks is to make original incorrectly predicted inputs yield correct detection outcomes.
  • Figure 3: Overview of our proposed AdvMark. The training process consists of two stages. In Stage I, the encoder and decoder are jointly trained end-to-end, obtaining the pre-trained encoder and decoder that serve as robust watermarking. In Stage II, the encoder and decoder are fine-tuned end-to-end by fooling a surrogate Deepfake detector, aiming at transforming robust watermarking into adversarial watermarking. During inference, only the final watermark encoder and decoder will be adopted.
  • Figure 4: Visualizations on watermarked images and normalized residuals. From top to bottom, refer to the default order in Table \ref{['tab:BER_PSNR']}.