Table of Contents
Fetching ...

The Efficacy of Transfer-based No-box Attacks on Image Watermarking: A Pragmatic Analysis

Qilong Wu, Varun Chandrasekaran

TL;DR

The paper investigates image watermarking robustness under the no-box threat model, where attackers lack knowledge about the victim model. It critically evaluates transfer-based evasion attacks, showing their success hinges on strong alignment and heavy computation, and that relaxing these assumptions severely limits effectiveness (max around $21.1\%$ evasion). The authors propose Optimization-Free Transfer (OFT), a simple, single-surrogate attack that achieves comparable or better evasion with far lower cost, and demonstrate its practical viability across multiple watermarking methods and datasets. Their findings imply that no-box attacks may be less practical than previously claimed, prompting more realistic evaluation standards and stronger watermarking defenses. The work provides both empirical guidance and a publicly available implementation to facilitate future research and benchmarking.

Abstract

Watermarking approaches are widely used to identify if images being circulated are authentic or AI-generated. Determining the robustness of image watermarking methods in the ``no-box'' setting, where the attacker is assumed to have no knowledge about the watermarking model, is an interesting problem. Our main finding is that evading the no-box setting is challenging: the success of optimization-based transfer attacks (involving training surrogate models) proposed in prior work~\cite{hu2024transfer} depends on impractical assumptions, including (i) aligning the architecture and training configurations of both the victim and attacker's surrogate watermarking models, as well as (ii) a large number of surrogate models with potentially large computational requirements. Relaxing these assumptions i.e., moving to a more pragmatic threat model results in a failed attack, with an evasion rate at most $21.1\%$. We show that when the configuration is mostly aligned, a simple non-optimization attack we propose, OFT, with one single surrogate model can already exceed the success of optimization-based efforts. Under the same $\ell_\infty$ norm perturbation budget of $0.25$, prior work~\citet{hu2024transfer} is comparable to or worse than OFT in $11$ out of $12$ configurations and has a limited advantage on the remaining one. The code used for all our experiments is available at \url{https://github.com/Ardor-Wu/transfer}.

The Efficacy of Transfer-based No-box Attacks on Image Watermarking: A Pragmatic Analysis

TL;DR

The paper investigates image watermarking robustness under the no-box threat model, where attackers lack knowledge about the victim model. It critically evaluates transfer-based evasion attacks, showing their success hinges on strong alignment and heavy computation, and that relaxing these assumptions severely limits effectiveness (max around evasion). The authors propose Optimization-Free Transfer (OFT), a simple, single-surrogate attack that achieves comparable or better evasion with far lower cost, and demonstrate its practical viability across multiple watermarking methods and datasets. Their findings imply that no-box attacks may be less practical than previously claimed, prompting more realistic evaluation standards and stronger watermarking defenses. The work provides both empirical guidance and a publicly available implementation to facilitate future research and benchmarking.

Abstract

Watermarking approaches are widely used to identify if images being circulated are authentic or AI-generated. Determining the robustness of image watermarking methods in the ``no-box'' setting, where the attacker is assumed to have no knowledge about the watermarking model, is an interesting problem. Our main finding is that evading the no-box setting is challenging: the success of optimization-based transfer attacks (involving training surrogate models) proposed in prior work~\cite{hu2024transfer} depends on impractical assumptions, including (i) aligning the architecture and training configurations of both the victim and attacker's surrogate watermarking models, as well as (ii) a large number of surrogate models with potentially large computational requirements. Relaxing these assumptions i.e., moving to a more pragmatic threat model results in a failed attack, with an evasion rate at most . We show that when the configuration is mostly aligned, a simple non-optimization attack we propose, OFT, with one single surrogate model can already exceed the success of optimization-based efforts. Under the same norm perturbation budget of , prior work~\citet{hu2024transfer} is comparable to or worse than OFT in out of configurations and has a limited advantage on the remaining one. The code used for all our experiments is available at \url{https://github.com/Ardor-Wu/transfer}.

Paper Structure

This paper contains 43 sections, 11 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Watermarked images generated by Stable Diffusion and their perturbed versions in different attacks that successfully evade detection.
  • Figure 2: Evaluation of hu2024transfer's attack to different watermarking methods. The only successful attack is when the target model's method matches the surrogate models' method (HiDDeN).
  • Figure 3: Evasion rate comparing hu2024transfer and OFT ($k=1$): Observe that OFT is superior most often!
  • Figure 4: Bit-wise accuracy comparing hu2024transfer and OFT ($k=1$): Observe that OFT is superior most often!
  • Figure 5: Evasion rate comparing hu2024transfer and OFT ($k=1$) on MidJourney: Observe that OFT is superior most often, consistent with DiffusionDB results.
  • ...and 9 more figures