Neural Honeytrace: Plug&Play Watermarking Framework against Model Extraction Attacks

Yixiao Xu; Binxing Fang; Rui Wang; Yinghai Zhou; Yuan Liu; Mohan Li; Zhihong Tian

Neural Honeytrace: Plug&Play Watermarking Framework against Model Extraction Attacks

Yixiao Xu, Binxing Fang, Rui Wang, Yinghai Zhou, Yuan Liu, Mohan Li, Zhihong Tian

TL;DR

This paper proposes Neural Honeytrace, a plug-and-play watermarking framework that operates without retraining, designing a training-free multi-step transmission strategy that leverages the long-tailed effect of backdoor learning to achieve efficient and robust watermark embedding.

Abstract

Triggerable watermarking enables model owners to assert ownership against model extraction attacks. However, most existing approaches require additional training, which limits post-deployment flexibility, and the lack of clear theoretical foundations makes them vulnerable to adaptive attacks. In this paper, we propose Neural Honeytrace, a plug-and-play watermarking framework that operates without retraining. We redefine the watermark transmission mechanism from an information perspective, designing a training-free multi-step transmission strategy that leverages the long-tailed effect of backdoor learning to achieve efficient and robust watermark embedding. Extensive experiments demonstrate that Neural Honeytrace reduces the average number of queries required for a worst-case t-test-based ownership verification to as low as $2\%$ of existing methods, while incurring zero training cost.

Neural Honeytrace: Plug&Play Watermarking Framework against Model Extraction Attacks

TL;DR

Abstract

of existing methods, while incurring zero training cost.

Paper Structure (27 sections, 16 equations, 12 figures, 13 tables)

This paper contains 27 sections, 16 equations, 12 figures, 13 tables.

Introduction
Background
Model Extraction Attack
Triggerable Watermarking
Hypothesis Test
Threat Model
Watermark Transmission Model
Neural Honeytrace
Training-free Watermark Embedding
Multi-step Watermark Transmission
Experiments
Experiment Setup
Experimental Results
Ablation Study
Related Work
...and 12 more sections

Figures (12)

Figure 1: Sample size required for ownership verification.
Figure 2: The long-tailed effect of MEA-Defender Lv24MEA-Defender watermarked ResNet-18 model on CIFAR-10.
Figure 3: Watermark transmission model.
Figure 4: Overview of the workflow of Neural Honeytrace.
Figure 5: Hyperparameter selection on CIFAR-10. Neural Honeytrace with different query sample size, $d$, $\alpha$, and $\beta$.
...and 7 more figures

Neural Honeytrace: Plug&Play Watermarking Framework against Model Extraction Attacks

TL;DR

Abstract

Neural Honeytrace: Plug&Play Watermarking Framework against Model Extraction Attacks

Authors

TL;DR

Abstract

Table of Contents

Figures (12)