Table of Contents
Fetching ...

Performance-lossless Black-box Model Watermarking

Na Zhao, Kejiang Chen, Weiming Zhang, Nenghai Yu

TL;DR

The paper tackles intellectual-property protection for high-cost models accessed through black-box APIs by proposing BranchWM, a lossless watermarking framework that decouples the primary task from a forensic branch. It introduces a MAC-based MUFT trigger and proves performance-losslessness via reduction to EUF-CMA security, addressing both completeness and soundness. The work provides a formal threat model, a rigorous protocol definition, and a language-model instantiation, along with analyses of interference attacks and practical security enhancements. The approach offers a practical, provable means to watermark remote models with minimal impact on utility, enabling verifiable ownership in API-based ecosystems.

Abstract

With the development of deep learning, high-value and high-cost models have become valuable assets, and related intellectual property protection technologies have become a hot topic. However, existing model watermarking work in black-box scenarios mainly originates from training-based backdoor methods, which probably degrade primary task performance. To address this, we propose a branch backdoor-based model watermarking protocol to protect model intellectual property, where a construction based on a message authentication scheme is adopted as the branch indicator after a comparative analysis with secure cryptographic technologies primitives. We prove the lossless performance of the protocol by reduction. In addition, we analyze the potential threats to the protocol and provide a secure and feasible watermarking instance for language models.

Performance-lossless Black-box Model Watermarking

TL;DR

The paper tackles intellectual-property protection for high-cost models accessed through black-box APIs by proposing BranchWM, a lossless watermarking framework that decouples the primary task from a forensic branch. It introduces a MAC-based MUFT trigger and proves performance-losslessness via reduction to EUF-CMA security, addressing both completeness and soundness. The work provides a formal threat model, a rigorous protocol definition, and a language-model instantiation, along with analyses of interference attacks and practical security enhancements. The approach offers a practical, provable means to watermark remote models with minimal impact on utility, enabling verifiable ownership in API-based ecosystems.

Abstract

With the development of deep learning, high-value and high-cost models have become valuable assets, and related intellectual property protection technologies have become a hot topic. However, existing model watermarking work in black-box scenarios mainly originates from training-based backdoor methods, which probably degrade primary task performance. To address this, we propose a branch backdoor-based model watermarking protocol to protect model intellectual property, where a construction based on a message authentication scheme is adopted as the branch indicator after a comparative analysis with secure cryptographic technologies primitives. We prove the lossless performance of the protocol by reduction. In addition, we analyze the potential threats to the protocol and provide a secure and feasible watermarking instance for language models.
Paper Structure (37 sections, 4 equations, 4 figures, 2 tables, 6 algorithms)

This paper contains 37 sections, 4 equations, 4 figures, 2 tables, 6 algorithms.

Figures (4)

  • Figure 1: There are three roles in the threat scenario. The model development team provide black-box deployment to the application service provider. The latter may dishonestly provide application services to end users, which we summarize as two cases of dishonesty in the red zone.
  • Figure 2: There are two modules in the IP verification protocol. The outer module is deployed outside the model API, responsible for generating triggers to obtain model copyright evidence and verifying the evidence. The inner module is added as a branch in the model API, responsible for checking whether the input is a trigger and generating the copyright evidence.
  • Figure 3: We use the MAC scheme which meets EUF-CMA to construct the Trigger Generator and Trigger Detector sub-modules. Such a construction generates the MUFT that satisfies the DUL-CSA property.
  • Figure 4: Interference attacks in two scenarios: (1) The attacker uses a low-cost model B to feign a high-cost model A and tries to filter the input or forge the output; (2) The attacker lies about having self-developed high-cost model B which is actually model A and tries to filter the input or erase the watermark signal in the output.