The Efficacy of Transfer-based No-box Attacks on Image Watermarking: A Pragmatic Analysis
Qilong Wu, Varun Chandrasekaran
TL;DR
The paper investigates image watermarking robustness under the no-box threat model, where attackers lack knowledge about the victim model. It critically evaluates transfer-based evasion attacks, showing their success hinges on strong alignment and heavy computation, and that relaxing these assumptions severely limits effectiveness (max around $21.1\%$ evasion). The authors propose Optimization-Free Transfer (OFT), a simple, single-surrogate attack that achieves comparable or better evasion with far lower cost, and demonstrate its practical viability across multiple watermarking methods and datasets. Their findings imply that no-box attacks may be less practical than previously claimed, prompting more realistic evaluation standards and stronger watermarking defenses. The work provides both empirical guidance and a publicly available implementation to facilitate future research and benchmarking.
Abstract
Watermarking approaches are widely used to identify if images being circulated are authentic or AI-generated. Determining the robustness of image watermarking methods in the ``no-box'' setting, where the attacker is assumed to have no knowledge about the watermarking model, is an interesting problem. Our main finding is that evading the no-box setting is challenging: the success of optimization-based transfer attacks (involving training surrogate models) proposed in prior work~\cite{hu2024transfer} depends on impractical assumptions, including (i) aligning the architecture and training configurations of both the victim and attacker's surrogate watermarking models, as well as (ii) a large number of surrogate models with potentially large computational requirements. Relaxing these assumptions i.e., moving to a more pragmatic threat model results in a failed attack, with an evasion rate at most $21.1\%$. We show that when the configuration is mostly aligned, a simple non-optimization attack we propose, OFT, with one single surrogate model can already exceed the success of optimization-based efforts. Under the same $\ell_\infty$ norm perturbation budget of $0.25$, prior work~\citet{hu2024transfer} is comparable to or worse than OFT in $11$ out of $12$ configurations and has a limited advantage on the remaining one. The code used for all our experiments is available at \url{https://github.com/Ardor-Wu/transfer}.
