Performance-lossless Black-box Model Watermarking
Na Zhao, Kejiang Chen, Weiming Zhang, Nenghai Yu
TL;DR
The paper tackles intellectual-property protection for high-cost models accessed through black-box APIs by proposing BranchWM, a lossless watermarking framework that decouples the primary task from a forensic branch. It introduces a MAC-based MUFT trigger and proves performance-losslessness via reduction to EUF-CMA security, addressing both completeness and soundness. The work provides a formal threat model, a rigorous protocol definition, and a language-model instantiation, along with analyses of interference attacks and practical security enhancements. The approach offers a practical, provable means to watermark remote models with minimal impact on utility, enabling verifiable ownership in API-based ecosystems.
Abstract
With the development of deep learning, high-value and high-cost models have become valuable assets, and related intellectual property protection technologies have become a hot topic. However, existing model watermarking work in black-box scenarios mainly originates from training-based backdoor methods, which probably degrade primary task performance. To address this, we propose a branch backdoor-based model watermarking protocol to protect model intellectual property, where a construction based on a message authentication scheme is adopted as the branch indicator after a comparative analysis with secure cryptographic technologies primitives. We prove the lossless performance of the protocol by reduction. In addition, we analyze the potential threats to the protocol and provide a secure and feasible watermarking instance for language models.
