Neural Honeytrace: Plug&Play Watermarking Framework against Model Extraction Attacks
Yixiao Xu, Binxing Fang, Rui Wang, Yinghai Zhou, Yuan Liu, Mohan Li, Zhihong Tian
TL;DR
This paper proposes Neural Honeytrace, a plug-and-play watermarking framework that operates without retraining, designing a training-free multi-step transmission strategy that leverages the long-tailed effect of backdoor learning to achieve efficient and robust watermark embedding.
Abstract
Triggerable watermarking enables model owners to assert ownership against model extraction attacks. However, most existing approaches require additional training, which limits post-deployment flexibility, and the lack of clear theoretical foundations makes them vulnerable to adaptive attacks. In this paper, we propose Neural Honeytrace, a plug-and-play watermarking framework that operates without retraining. We redefine the watermark transmission mechanism from an information perspective, designing a training-free multi-step transmission strategy that leverages the long-tailed effect of backdoor learning to achieve efficient and robust watermark embedding. Extensive experiments demonstrate that Neural Honeytrace reduces the average number of queries required for a worst-case t-test-based ownership verification to as low as $2\%$ of existing methods, while incurring zero training cost.
