Table of Contents
Fetching ...

Neural Honeytrace: Plug&Play Watermarking Framework against Model Extraction Attacks

Yixiao Xu, Binxing Fang, Rui Wang, Yinghai Zhou, Yuan Liu, Mohan Li, Zhihong Tian

TL;DR

This paper proposes Neural Honeytrace, a plug-and-play watermarking framework that operates without retraining, designing a training-free multi-step transmission strategy that leverages the long-tailed effect of backdoor learning to achieve efficient and robust watermark embedding.

Abstract

Triggerable watermarking enables model owners to assert ownership against model extraction attacks. However, most existing approaches require additional training, which limits post-deployment flexibility, and the lack of clear theoretical foundations makes them vulnerable to adaptive attacks. In this paper, we propose Neural Honeytrace, a plug-and-play watermarking framework that operates without retraining. We redefine the watermark transmission mechanism from an information perspective, designing a training-free multi-step transmission strategy that leverages the long-tailed effect of backdoor learning to achieve efficient and robust watermark embedding. Extensive experiments demonstrate that Neural Honeytrace reduces the average number of queries required for a worst-case t-test-based ownership verification to as low as $2\%$ of existing methods, while incurring zero training cost.

Neural Honeytrace: Plug&Play Watermarking Framework against Model Extraction Attacks

TL;DR

This paper proposes Neural Honeytrace, a plug-and-play watermarking framework that operates without retraining, designing a training-free multi-step transmission strategy that leverages the long-tailed effect of backdoor learning to achieve efficient and robust watermark embedding.

Abstract

Triggerable watermarking enables model owners to assert ownership against model extraction attacks. However, most existing approaches require additional training, which limits post-deployment flexibility, and the lack of clear theoretical foundations makes them vulnerable to adaptive attacks. In this paper, we propose Neural Honeytrace, a plug-and-play watermarking framework that operates without retraining. We redefine the watermark transmission mechanism from an information perspective, designing a training-free multi-step transmission strategy that leverages the long-tailed effect of backdoor learning to achieve efficient and robust watermark embedding. Extensive experiments demonstrate that Neural Honeytrace reduces the average number of queries required for a worst-case t-test-based ownership verification to as low as of existing methods, while incurring zero training cost.
Paper Structure (27 sections, 16 equations, 12 figures, 13 tables)

This paper contains 27 sections, 16 equations, 12 figures, 13 tables.

Figures (12)

  • Figure 1: Sample size required for ownership verification.
  • Figure 2: The long-tailed effect of MEA-Defender Lv24MEA-Defender watermarked ResNet-18 model on CIFAR-10.
  • Figure 3: Watermark transmission model.
  • Figure 4: Overview of the workflow of Neural Honeytrace.
  • Figure 5: Hyperparameter selection on CIFAR-10. Neural Honeytrace with different query sample size, $d$, $\alpha$, and $\beta$.
  • ...and 7 more figures