Table of Contents
Fetching ...

Evaluation of Security of ML-based Watermarking: Copy and Removal Attacks

Vitaliy Kinakh, Brian Pulfer, Yury Belousov, Pierre Fernandez, Teddy Furon, Slava Voloshynovskiy

TL;DR

The paper investigates the security of watermarking schemes that embed information in the latent spaces of foundation models, focusing on adversarial embedding attacks. It introduces two attack classes—copy attacks and removal attacks—and evaluates them on a DINOv1-based zero-bit and multi-bit watermarking setup using the DIV2K dataset. The findings show that copy attacks achieve high success, especially for zero-bit schemes, while removal attacks are more effective overall, with targeted removals leveraging specific target images or latent states to erase or degrade watermark recoverability. The study highlights significant vulnerabilities in latent-space watermarking with current foundation models and calls for evaluating a broader range of models and benchmarking against classical watermarking approaches. These insights have practical implications for designing more secure watermarking for AI-generated and manipulated content.

Abstract

The vast amounts of digital content captured from the real world or AI-generated media necessitate methods for copyright protection, traceability, or data provenance verification. Digital watermarking serves as a crucial approach to address these challenges. Its evolution spans three generations: handcrafted, autoencoder-based, and foundation model based methods. While the robustness of these systems is well-documented, the security against adversarial attacks remains underexplored. This paper evaluates the security of foundation models' latent space digital watermarking systems that utilize adversarial embedding techniques. A series of experiments investigate the security dimensions under copy and removal attacks, providing empirical insights into these systems' vulnerabilities. All experimental codes and results are available at https://github.com/vkinakh/ssl-watermarking-attacks .

Evaluation of Security of ML-based Watermarking: Copy and Removal Attacks

TL;DR

The paper investigates the security of watermarking schemes that embed information in the latent spaces of foundation models, focusing on adversarial embedding attacks. It introduces two attack classes—copy attacks and removal attacks—and evaluates them on a DINOv1-based zero-bit and multi-bit watermarking setup using the DIV2K dataset. The findings show that copy attacks achieve high success, especially for zero-bit schemes, while removal attacks are more effective overall, with targeted removals leveraging specific target images or latent states to erase or degrade watermark recoverability. The study highlights significant vulnerabilities in latent-space watermarking with current foundation models and calls for evaluating a broader range of models and benchmarking against classical watermarking approaches. These insights have practical implications for designing more secure watermarking for AI-generated and manipulated content.

Abstract

The vast amounts of digital content captured from the real world or AI-generated media necessitate methods for copyright protection, traceability, or data provenance verification. Digital watermarking serves as a crucial approach to address these challenges. Its evolution spans three generations: handcrafted, autoencoder-based, and foundation model based methods. While the robustness of these systems is well-documented, the security against adversarial attacks remains underexplored. This paper evaluates the security of foundation models' latent space digital watermarking systems that utilize adversarial embedding techniques. A series of experiments investigate the security dimensions under copy and removal attacks, providing empirical insights into these systems' vulnerabilities. All experimental codes and results are available at https://github.com/vkinakh/ssl-watermarking-attacks .
Paper Structure (12 sections, 11 equations, 6 figures, 2 algorithms)

This paper contains 12 sections, 11 equations, 6 figures, 2 algorithms.

Figures (6)

  • Figure 1: Generalized diagram explaining the proposed (a) copy and (b) untargted and targeted removal attacks (on the example of zero-bit watermarking in the latent space). The secret carrier $\bf w$ and the decision region ${\mathcal{D}}_k$ (show in gray) are unknown for the attacker.
  • Figure 2: Bit Error Rate (BER) for multi-bit watermarking under the copy attack with varying $\text{PSNR}_a$ and watermark payloads $\ell$. The attack can successfully copy the binary message (BER $<$ 1%) of the watermarked image into any non-watermarked image, even at very low distortion budgets ($\text{PSNR}_a = 45$ dB).
  • Figure 3: Probability of miss for zero-bit watermarking under untargeted removal attack against $\text{PSNR}_a$ of the attacked image, for varying probability of false acceptance. The untargeted attack achieves $P_{\text{m}}$ close to 1 at lower values of $\text{PSNR}_a$ around 40 dB, while $P_{\text{m}}$ decreases with the increase of $\text{PSNR}_a$ towards 50 dB.
  • Figure 4: Bit Error Rate for multi-bit watermarking under untargeted removal attack against $\text{PSNR}_a$ at varying payload of $\ell$ bits. The attack increases the BER significantly, inverting the majority of the hidden bits.
  • Figure 5: Probability of miss for zero-bit watermarking under targeted removal attack with different target image selection strategies. All kinds of targeted attacks achieve better success rates than the untargeted ones.
  • ...and 1 more figures