Table of Contents
Fetching ...

Robust and Minimally Invasive Watermarking for EaaS

Zongqi Wang, Baoyuan Wu, Jingyuan Deng, Yujiu Yang

TL;DR

This work tackles IP protection for Embeddings as a Service (EaaS) by addressing the vulnerability of existing watermarks to removal. It introduces ESpeW, embedding-specific watermarking that selectively replaces small-magnitude components at unique positions within each embedding, using a private target embedding as a scarce key. Empirical results across SST2, MIND, AG News, and Enron Spam show ESpeW maintains embedding quality (less than 1% perturbation) and remains robust against removal strategies like CSE, while enabling reliable copyright verification with very small p-values (e.g., $p<10^{-11}$). The study also analyzes tradeoffs, scalability, and potential leakage risks, offering a practical and scalable approach for benign providers to offer watermark-protected EaaS services.

Abstract

Embeddings as a Service (EaaS) is emerging as a crucial role in AI applications. Unfortunately, EaaS is vulnerable to model extraction attacks, highlighting the urgent need for copyright protection. Although some preliminary works propose applying embedding watermarks to protect EaaS, recent research reveals that these watermarks can be easily removed. Hence, it is crucial to inject robust watermarks resistant to watermark removal attacks. Existing watermarking methods typically inject a target embedding into embeddings through linear interpolation when the text contains triggers. However, this mechanism results in each watermarked embedding having the same component, which makes the watermark easy to identify and eliminate. Motivated by this, in this paper, we propose a novel embedding-specific watermarking (ESpeW) mechanism to offer robust copyright protection for EaaS. Our approach involves injecting unique, yet readily identifiable watermarks into each embedding. Watermarks inserted by ESpeW are designed to maintain a significant distance from one another and to avoid sharing common components, thus making it significantly more challenging to remove the watermarks. Moreover, ESpeW is minimally invasive, as it reduces the impact on embeddings to less than 1\%, setting a new milestone in watermarking for EaaS. Extensive experiments on four popular datasets demonstrate that ESpeW can even watermark successfully against a highly aggressive removal strategy without sacrificing the quality of embeddings.

Robust and Minimally Invasive Watermarking for EaaS

TL;DR

This work tackles IP protection for Embeddings as a Service (EaaS) by addressing the vulnerability of existing watermarks to removal. It introduces ESpeW, embedding-specific watermarking that selectively replaces small-magnitude components at unique positions within each embedding, using a private target embedding as a scarce key. Empirical results across SST2, MIND, AG News, and Enron Spam show ESpeW maintains embedding quality (less than 1% perturbation) and remains robust against removal strategies like CSE, while enabling reliable copyright verification with very small p-values (e.g., ). The study also analyzes tradeoffs, scalability, and potential leakage risks, offering a practical and scalable approach for benign providers to offer watermark-protected EaaS services.

Abstract

Embeddings as a Service (EaaS) is emerging as a crucial role in AI applications. Unfortunately, EaaS is vulnerable to model extraction attacks, highlighting the urgent need for copyright protection. Although some preliminary works propose applying embedding watermarks to protect EaaS, recent research reveals that these watermarks can be easily removed. Hence, it is crucial to inject robust watermarks resistant to watermark removal attacks. Existing watermarking methods typically inject a target embedding into embeddings through linear interpolation when the text contains triggers. However, this mechanism results in each watermarked embedding having the same component, which makes the watermark easy to identify and eliminate. Motivated by this, in this paper, we propose a novel embedding-specific watermarking (ESpeW) mechanism to offer robust copyright protection for EaaS. Our approach involves injecting unique, yet readily identifiable watermarks into each embedding. Watermarks inserted by ESpeW are designed to maintain a significant distance from one another and to avoid sharing common components, thus making it significantly more challenging to remove the watermarks. Moreover, ESpeW is minimally invasive, as it reduces the impact on embeddings to less than 1\%, setting a new milestone in watermarking for EaaS. Extensive experiments on four popular datasets demonstrate that ESpeW can even watermark successfully against a highly aggressive removal strategy without sacrificing the quality of embeddings.

Paper Structure

This paper contains 42 sections, 10 equations, 13 figures, 16 tables, 2 algorithms.

Figures (13)

  • Figure 1: The framework of our ESpeW. The upper part presents an overview of watermark injection and model extraction. (1) The stealer queries the provider's EaaS to obtain a dataset that maps texts to embeddings. During this process, the provider injects watermarks. (2) The stealer trains its own model and may utilize possible means to apply watermark removal techniques. (3) The provider queries the stealer's EaaS for copyright verification. The lower part offers a detailed explanation of the key modules for watermark insertion and verification.
  • Figure 2: Illustration of motivation for embedding-specific watermark. Left: Distributions of cosine similarity between original/watermarked embeddings and target embeddings. Middle: Calculation processes of watermarking. Right: Shared components among all watermarked embeddings.
  • Figure 3: Ablation results of watermark proportion on SST2. (a) shows results without CSE. (b) shows results with CSE, where $K$ is set to 50.
  • Figure 4: Effect of dropout with a 25% watermark proportion. (a) and (b) show detection results under different drop rate without CSE. (c) and (d) show detection results under different drop rate with CSE (K=50).
  • Figure 5: Average cosine similarity between watermarked and clean embeddings.
  • ...and 8 more figures