The Good, the Bad and the Ugly: Meta-Analysis of Watermarks, Transferable Attacks and Adversarial Defenses
Grzegorz Głuch, Berkant Turan, Sai Ganesh Nagarajan, Sebastian Pokutta
TL;DR
This work formalizes the trade-off between backdoor-based watermarks and adversarial defenses as an interactive verifier–prover protocol and identifies a third, critical option: transferable attacks. It proves that for any learnable task at error ${\boldsymbol \epsilon}$, at least one of Watermark, Defense, or Transferable Attack must exist, and provides a cryptography-based construction of transferable attacks via Fully Homomorphic Encryption. The authors further show that bounded VC-dimension tasks admit adversarial defenses (and under some conditions watermarks) while flexible tasks can host transferable attacks that imply rich cryptographic primitives (EFID pairs and PRGs). A key takeaway is a resource-based rule of thumb: allocating $T^2$ computation for the defender suffices to realize a defense when a defense exists; failure of such a budget implies the presence of transferable attacks, making watermarks unlikely. Overall, the paper bridges learning theory, cryptography, and security, offering a new lens on ownership verification, robustness, and attack transferability in classification tasks.
Abstract
We formalize and analyze the trade-off between backdoor-based watermarks and adversarial defenses, framing it as an interactive protocol between a verifier and a prover. While previous works have primarily focused on this trade-off, our analysis extends it by identifying transferable attacks as a third, counterintuitive, but necessary option. Our main result shows that for all learning tasks, at least one of the three exists: a watermark, an adversarial defense, or a transferable attack. By transferable attack, we refer to an efficient algorithm that generates queries indistinguishable from the data distribution and capable of fooling all efficient defenders. Using cryptographic techniques, specifically fully homomorphic encryption, we construct a transferable attack and prove its necessity in this trade-off. Finally, we show that tasks of bounded VC-dimension allow adversarial defenses against all attackers, while a subclass allows watermarks secure against fast adversaries.
