Table of Contents
Fetching ...

The Good, the Bad and the Ugly: Meta-Analysis of Watermarks, Transferable Attacks and Adversarial Defenses

Grzegorz Głuch, Berkant Turan, Sai Ganesh Nagarajan, Sebastian Pokutta

TL;DR

This work formalizes the trade-off between backdoor-based watermarks and adversarial defenses as an interactive verifier–prover protocol and identifies a third, critical option: transferable attacks. It proves that for any learnable task at error ${\boldsymbol \epsilon}$, at least one of Watermark, Defense, or Transferable Attack must exist, and provides a cryptography-based construction of transferable attacks via Fully Homomorphic Encryption. The authors further show that bounded VC-dimension tasks admit adversarial defenses (and under some conditions watermarks) while flexible tasks can host transferable attacks that imply rich cryptographic primitives (EFID pairs and PRGs). A key takeaway is a resource-based rule of thumb: allocating $T^2$ computation for the defender suffices to realize a defense when a defense exists; failure of such a budget implies the presence of transferable attacks, making watermarks unlikely. Overall, the paper bridges learning theory, cryptography, and security, offering a new lens on ownership verification, robustness, and attack transferability in classification tasks.

Abstract

We formalize and analyze the trade-off between backdoor-based watermarks and adversarial defenses, framing it as an interactive protocol between a verifier and a prover. While previous works have primarily focused on this trade-off, our analysis extends it by identifying transferable attacks as a third, counterintuitive, but necessary option. Our main result shows that for all learning tasks, at least one of the three exists: a watermark, an adversarial defense, or a transferable attack. By transferable attack, we refer to an efficient algorithm that generates queries indistinguishable from the data distribution and capable of fooling all efficient defenders. Using cryptographic techniques, specifically fully homomorphic encryption, we construct a transferable attack and prove its necessity in this trade-off. Finally, we show that tasks of bounded VC-dimension allow adversarial defenses against all attackers, while a subclass allows watermarks secure against fast adversaries.

The Good, the Bad and the Ugly: Meta-Analysis of Watermarks, Transferable Attacks and Adversarial Defenses

TL;DR

This work formalizes the trade-off between backdoor-based watermarks and adversarial defenses as an interactive verifier–prover protocol and identifies a third, critical option: transferable attacks. It proves that for any learnable task at error , at least one of Watermark, Defense, or Transferable Attack must exist, and provides a cryptography-based construction of transferable attacks via Fully Homomorphic Encryption. The authors further show that bounded VC-dimension tasks admit adversarial defenses (and under some conditions watermarks) while flexible tasks can host transferable attacks that imply rich cryptographic primitives (EFID pairs and PRGs). A key takeaway is a resource-based rule of thumb: allocating computation for the defender suffices to realize a defense when a defense exists; failure of such a budget implies the presence of transferable attacks, making watermarks unlikely. Overall, the paper bridges learning theory, cryptography, and security, offering a new lens on ownership verification, robustness, and attack transferability in classification tasks.

Abstract

We formalize and analyze the trade-off between backdoor-based watermarks and adversarial defenses, framing it as an interactive protocol between a verifier and a prover. While previous works have primarily focused on this trade-off, our analysis extends it by identifying transferable attacks as a third, counterintuitive, but necessary option. Our main result shows that for all learning tasks, at least one of the three exists: a watermark, an adversarial defense, or a transferable attack. By transferable attack, we refer to an efficient algorithm that generates queries indistinguishable from the data distribution and capable of fooling all efficient defenders. Using cryptographic techniques, specifically fully homomorphic encryption, we construct a transferable attack and prove its necessity in this trade-off. Finally, we show that tasks of bounded VC-dimension allow adversarial defenses against all attackers, while a subclass allows watermarks secure against fast adversaries.

Paper Structure

This paper contains 78 sections, 15 theorems, 67 equations, 3 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

For every ${\epsilon} \in (0,\frac{1}{2}), S : \mathbbm{N} \rightarrow \mathbbm{N}$ and learning task $\mathbb{L}$ learnable to error ${\epsilon}$ with high confidence with circuit complexity $S(n)$, at least one of these three existsWe remark that formally the existence does not hold for all suffic

Figures (3)

  • Figure 1: Schematic overview of the interaction structure, along with short, informal versions of our definitions of (a) Watermark (Definition \ref{['def:watermarkfull']}), (b) Adversarial Defense (Definition \ref{['def:defense3']}), and (c) Transferable Attack (Definition \ref{['def:transferableadvexp']}), with (c) tied to cryptography (see Section \ref{['sec:crypto']}).
  • Figure 2: The left part of the figure represents a Lines on Circle Learning Task$\mathbb{L}^\circ$ with a ground truth function denoted by $h_w$. On the right, we define a cryptography-augmented learning task derived from $\mathbb{L}^\circ$. In its distribution, a "clear" or an "encrypted" sample is observed with equal probability. Given their respective times, both $\mathbf{A}$ and $\mathbf{B}$ are able to learn a low-error classifier $h^\mathbf{A}$, $h^\mathbf{B}$ respectively, by learning only on the clear samples. $\mathbf{A}$ is able to compute a Transferable Attack by computing an encryption of a point close to the decision boundary of her classifier $h^\mathbf{A}$.
  • Figure 3: Overview of the taxonomy of learning tasks, illustrating the presence of Watermarks, Adversarial Defenses, and Transferable Attacks for learning tasks of bounded VC dimension. The axes represent the size bound for the parties in the corresponding schemes. The blue regions depict positive results, the red negative, and the gray regimes of parameters which are not of interest. See Lemma \ref{['lem:VCdefense']} and \ref{['lem:VCwatermark']} for details about blue regions. The curved line represents a potential application of Theorem \ref{['thm:maininformal']}, which says that at least one of the three points should be blue.

Theorems & Definitions (39)

  • Definition 1: Learning Task (Informal)
  • Definition 2: Computationally Bounded Learnability (Informal)
  • Theorem 1: Main Theorem, informal
  • proof : Proof (Sketch).
  • Theorem 2: Transferable Attack for a Cryptography-based Learning Task, informal
  • proof : Proof (Sketch).
  • Theorem 3: Transferable Attacks imply EFID pairs, informal
  • Lemma 1: Adversarial Defense for bounded VC-dimension, informal
  • Lemma 2: Watermark for bounded VC-dimension against fast adversaries, informal
  • Definition 6: Learning Task
  • ...and 29 more