Table of Contents
Fetching ...

Watermarks for Embeddings-as-a-Service Large Language Models

Anudeex Shetty

TL;DR

This work investigates IP protection for Embeddings-as-a-Service (EaaS) against imitation attacks, revealing that existing watermarking methods based on trigger words are vulnerable to paraphrasing. It introduces Watermarking EaaS with Linear Transformation (WET), a reversible, matrix-based watermark that embeds watermarks via a transformation ${f T}$ and allows recovery with ${f T}^+}$, achieving near-perfect verifiability while preserving embedding utility. The paraphrasing analysis demonstrates dilution of trigger-word watermarks under aggregation of multiple paraphrases, motivating the need for non-textual watermarks. Empirical results show WET maintains high downstream task performance and robust verification across datasets, attacker sizes, and noise conditions, highlighting its potential for practical IP protection in EaaS deployments.

Abstract

Large Language Models (LLMs) have demonstrated exceptional capabilities in natural language understanding and generation. Based on these LLMs, businesses have started to provide Embeddings-as-a-Service (EaaS), offering feature extraction capabilities (in the form of text embeddings) that benefit downstream natural language processing tasks. However, prior research has demonstrated that EaaS is vulnerable to imitation attacks, where an attacker clones the service's model in a black-box manner without access to the model's internal workings. In response, watermarks have been added to the text embeddings to protect the intellectual property of EaaS providers by allowing them to check for model ownership. This thesis focuses on defending against imitation attacks by investigating EaaS watermarks. To achieve this goal, we unveil novel attacks and propose and validate new watermarking techniques. Firstly, we show that existing EaaS watermarks can be removed through paraphrasing the input text when attackers clone the model during imitation attacks. Our study illustrates that paraphrasing can effectively bypass current state-of-the-art EaaS watermarks across various attack setups (including different paraphrasing techniques and models) and datasets in most instances. This demonstrates a new vulnerability in recent EaaS watermarking techniques. Subsequently, as a countermeasure, we propose a novel watermarking technique, WET (Watermarking EaaS with Linear Transformation), which employs linear transformation of the embeddings. Watermark verification is conducted by applying a reverse transformation and comparing the similarity between recovered and original embeddings. We demonstrate its robustness against paraphrasing attacks with near-perfect verifiability. We conduct detailed ablation studies to assess the significance of each component and hyperparameter in WET.

Watermarks for Embeddings-as-a-Service Large Language Models

TL;DR

This work investigates IP protection for Embeddings-as-a-Service (EaaS) against imitation attacks, revealing that existing watermarking methods based on trigger words are vulnerable to paraphrasing. It introduces Watermarking EaaS with Linear Transformation (WET), a reversible, matrix-based watermark that embeds watermarks via a transformation and allows recovery with , achieving near-perfect verifiability while preserving embedding utility. The paraphrasing analysis demonstrates dilution of trigger-word watermarks under aggregation of multiple paraphrases, motivating the need for non-textual watermarks. Empirical results show WET maintains high downstream task performance and robust verification across datasets, attacker sizes, and noise conditions, highlighting its potential for practical IP protection in EaaS deployments.

Abstract

Large Language Models (LLMs) have demonstrated exceptional capabilities in natural language understanding and generation. Based on these LLMs, businesses have started to provide Embeddings-as-a-Service (EaaS), offering feature extraction capabilities (in the form of text embeddings) that benefit downstream natural language processing tasks. However, prior research has demonstrated that EaaS is vulnerable to imitation attacks, where an attacker clones the service's model in a black-box manner without access to the model's internal workings. In response, watermarks have been added to the text embeddings to protect the intellectual property of EaaS providers by allowing them to check for model ownership. This thesis focuses on defending against imitation attacks by investigating EaaS watermarks. To achieve this goal, we unveil novel attacks and propose and validate new watermarking techniques. Firstly, we show that existing EaaS watermarks can be removed through paraphrasing the input text when attackers clone the model during imitation attacks. Our study illustrates that paraphrasing can effectively bypass current state-of-the-art EaaS watermarks across various attack setups (including different paraphrasing techniques and models) and datasets in most instances. This demonstrates a new vulnerability in recent EaaS watermarking techniques. Subsequently, as a countermeasure, we propose a novel watermarking technique, WET (Watermarking EaaS with Linear Transformation), which employs linear transformation of the embeddings. Watermark verification is conducted by applying a reverse transformation and comparing the similarity between recovered and original embeddings. We demonstrate its robustness against paraphrasing attacks with near-perfect verifiability. We conduct detailed ablation studies to assess the significance of each component and hyperparameter in WET.

Paper Structure

This paper contains 97 sections, 1 theorem, 32 equations, 32 figures, 22 tables, 1 algorithm.

Key Result

Theorem 1

Given $P$ watermarked embeddings from $P$ paraphrases for an input text, ${\bm{e}}_{p}^{i} = f({\bm{e}}_{o}^{i})$, where $f$ is a linear transformation function, as defined in Equation eq:wm-injection and $i \in [1 \dots P]$. The average of these paraphrase embeddings is equivalent to a linear trans

Figures (32)

  • Figure 1: Embeddings-as-a-Service (EaaS) vs. Machine Learning-as-a-Service (MLaaS). For the same text input (a positive movie review), EaaS outputs an embedding (or vector), whereas MLaaS (in this case, a sentiment classifier) outputs a classification label.
  • Figure 2: EaaS Imitation Attack Overview. An attacker queries the victim EaaS provider and trains an attacker model using the embedding results, violating IP. Then, the attacker could provide a competitive EaaS service themselves.
  • Figure 3: Evolution of Language Models. LM stands for language model. Figure adapted from zhao2023survey-LLM.
  • Figure 4: Illustration of pre-trained language model (in this case an EaaS, more in Section \ref{['sec:eaas']}) embeddings capturing semantic relations in the embedding space (Model: text-similarity-davinci-001). Figure adapted from https://openai.com/index/introducing-text-and-code-embeddings/.
  • Figure 5: An example of embeddings used as input features for task-specific model (a simple NN). Embeddings can be used for downstream tasks such as classification, clustering, anomaly detection, retrieval, visualisation, vector databases, etc.
  • ...and 27 more figures

Theorems & Definitions (2)

  • Theorem 1: WET Effectiveness against Paraphrasing Attack
  • Proof 1