A Game Between the Defender and the Attacker for Trigger-based Black-box Model Watermarking
Chaoyue Huang, Hanzhou Wu
TL;DR
The work tackles the theoretical foundation of trigger-based black-box DNN watermarking by framing defender-attacker interactions as a cooperative-adversarial game. It defines a payoff structure that accounts for ownership defense costs, attack costs, and competitive rewards, and expresses performance through CSR and ASR with a simplified relation $CSR_{i,j} = (1-β_j)(1 - λα_i) + β_j r_{i,j}$. By reducing the strategy space to two pure strategies per player, the authors derive a two-by-two mixed-strategy game and outline conditions for Nash equilibria, showing the defender’s optimal response depends on robustness differentials and attack strength. The results offer theoretical guidance for designing robust watermarking schemes and motivate future work on trigger-set design, practical validation, and extensions to generative-model watermarking in secure ML systems.
Abstract
Watermarking deep neural network (DNN) models has attracted a great deal of attention and interest in recent years because of the increasing demand to protect the intellectual property of DNN models. Many practical algorithms have been proposed by covertly embedding a secret watermark into a given DNN model through either parametric/structural modulation or backdooring against intellectual property infringement from the attacker while preserving the model performance on the original task. Despite the performance of these approaches, the lack of basic research restricts the algorithmic design to either a trial-based method or a data-driven technique. This has motivated the authors in this paper to introduce a game between the model attacker and the model defender for trigger-based black-box model watermarking. For each of the two players, we construct the payoff function and determine the optimal response, which enriches the theoretical foundation of model watermarking and may inspire us to develop novel schemes in the future.
