Table of Contents
Fetching ...

DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks

Yunfei Yang, Xiaojun Chen, Yuexin Xuan, Zhendong Zhao, Xin Zhao, He Li

TL;DR

DeepTracer addresses the vulnerability of existing black-box watermarks to model stealing by analyzing why watermarks forget under theft and tightly coupling watermark signals with the primary task. It introduces a four-stage framework: watermark samples construction, coupled watermark embedding with a same-class coupling loss, watermark key samples generation via a two-stage filtering mechanism, and black-box ownership verification. The method demonstrates state-of-the-art robustness across multiple datasets and attack types, maintaining high watermark success rates while preserving primary-task accuracy, and exhibiting strong resistance to removal and detection attacks. This has practical implications for reliable ownership verification in AI services and strengthens protections against unauthorized model replication.

Abstract

Model watermarking techniques can embed watermark information into the protected model for ownership declaration by constructing specific input-output pairs. However, existing watermarks are easily removed when facing model stealing attacks, and make it difficult for model owners to effectively verify the copyright of stolen models. In this paper, we analyze the root cause of the failure of current watermarking methods under model stealing scenarios and then explore potential solutions. Specifically, we introduce a robust watermarking framework, DeepTracer, which leverages a novel watermark samples construction method and a same-class coupling loss constraint. DeepTracer can incur a high-coupling model between watermark task and primary task that makes adversaries inevitably learn the hidden watermark task when stealing the primary task functionality. Furthermore, we propose an effective watermark samples filtering mechanism that elaborately select watermark key samples used in model ownership verification to enhance the reliability of watermarks. Extensive experiments across multiple datasets and models demonstrate that our method surpasses existing approaches in defending against various model stealing attacks, as well as watermark attacks, and achieves new state-of-the-art effectiveness and robustness.

DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks

TL;DR

DeepTracer addresses the vulnerability of existing black-box watermarks to model stealing by analyzing why watermarks forget under theft and tightly coupling watermark signals with the primary task. It introduces a four-stage framework: watermark samples construction, coupled watermark embedding with a same-class coupling loss, watermark key samples generation via a two-stage filtering mechanism, and black-box ownership verification. The method demonstrates state-of-the-art robustness across multiple datasets and attack types, maintaining high watermark success rates while preserving primary-task accuracy, and exhibiting strong resistance to removal and detection attacks. This has practical implications for reliable ownership verification in AI services and strengthens protections against unauthorized model replication.

Abstract

Model watermarking techniques can embed watermark information into the protected model for ownership declaration by constructing specific input-output pairs. However, existing watermarks are easily removed when facing model stealing attacks, and make it difficult for model owners to effectively verify the copyright of stolen models. In this paper, we analyze the root cause of the failure of current watermarking methods under model stealing scenarios and then explore potential solutions. Specifically, we introduce a robust watermarking framework, DeepTracer, which leverages a novel watermark samples construction method and a same-class coupling loss constraint. DeepTracer can incur a high-coupling model between watermark task and primary task that makes adversaries inevitably learn the hidden watermark task when stealing the primary task functionality. Furthermore, we propose an effective watermark samples filtering mechanism that elaborately select watermark key samples used in model ownership verification to enhance the reliability of watermarks. Extensive experiments across multiple datasets and models demonstrate that our method surpasses existing approaches in defending against various model stealing attacks, as well as watermark attacks, and achieves new state-of-the-art effectiveness and robustness.

Paper Structure

This paper contains 49 sections, 29 equations, 20 figures, 23 tables.

Figures (20)

  • Figure 1: Heatmap of activation within the neural network for different watermarking methods. Lighter colors indicate greater activation.
  • Figure 2: Overview of DeepTracer. The model owner first adaptively selects four source classes and one target label, and then constructs watermark samples and mixes them into the normal dataset for model training. Next, the owner generates a filtered key samples set for the watermarked model and saves it, which is used for future ownership verification of suspect models.
  • Figure 3: Robustness against four fine-tuning attacks on stolen model.
  • Figure 4: Robustness against pruning attack on stolen model.
  • Figure 5: Effect of class selection strategy.
  • ...and 15 more figures