Table of Contents
Fetching ...

WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection

Anudeex Shetty, Yue Teng, Ke He, Qiongkai Xu

TL;DR

The paper investigates IP protection challenges in Embedding-as-a-Service by introducing a CSE attack that clusters, selects, and eliminates watermark directions to remove EmbMarker without harming embedding utility. To counter this vulnerability, it proposes WARDEN, a defense that uses multi-directional watermarks and a verification protocol to improve robustness against CSE and enable reliable copyright detection. The authors provide extensive experiments across SST2, MIND, AG News, and Enron to demonstrate that EmbMarker can be bypassed, while WARDEN significantly enhances watermark verification and resists elimination and reconstruction attacks. The findings have practical implications for copyright enforcement in EaaS and suggest future work on multi-owner ownership, theoretical guarantees, and more conservative watermarking strategies.

Abstract

Embedding as a Service (EaaS) has become a widely adopted solution, which offers feature extraction capabilities for addressing various downstream tasks in Natural Language Processing (NLP). Prior studies have shown that EaaS can be prone to model extraction attacks; nevertheless, this concern could be mitigated by adding backdoor watermarks to the text embeddings and subsequently verifying the attack models post-publication. Through the analysis of the recent watermarking strategy for EaaS, EmbMarker, we design a novel CSE (Clustering, Selection, Elimination) attack that removes the backdoor watermark while maintaining the high utility of embeddings, indicating that the previous watermarking approach can be breached. In response to this new threat, we propose a new protocol to make the removal of watermarks more challenging by incorporating multiple possible watermark directions. Our defense approach, WARDEN, notably increases the stealthiness of watermarks and has been empirically shown to be effective against CSE attack.

WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection

TL;DR

The paper investigates IP protection challenges in Embedding-as-a-Service by introducing a CSE attack that clusters, selects, and eliminates watermark directions to remove EmbMarker without harming embedding utility. To counter this vulnerability, it proposes WARDEN, a defense that uses multi-directional watermarks and a verification protocol to improve robustness against CSE and enable reliable copyright detection. The authors provide extensive experiments across SST2, MIND, AG News, and Enron to demonstrate that EmbMarker can be bypassed, while WARDEN significantly enhances watermark verification and resists elimination and reconstruction attacks. The findings have practical implications for copyright enforcement in EaaS and suggest future work on multi-owner ownership, theoretical guarantees, and more conservative watermarking strategies.

Abstract

Embedding as a Service (EaaS) has become a widely adopted solution, which offers feature extraction capabilities for addressing various downstream tasks in Natural Language Processing (NLP). Prior studies have shown that EaaS can be prone to model extraction attacks; nevertheless, this concern could be mitigated by adding backdoor watermarks to the text embeddings and subsequently verifying the attack models post-publication. Through the analysis of the recent watermarking strategy for EaaS, EmbMarker, we design a novel CSE (Clustering, Selection, Elimination) attack that removes the backdoor watermark while maintaining the high utility of embeddings, indicating that the previous watermarking approach can be breached. In response to this new threat, we propose a new protocol to make the removal of watermarks more challenging by incorporating multiple possible watermark directions. Our defense approach, WARDEN, notably increases the stealthiness of watermarks and has been empirically shown to be effective against CSE attack.
Paper Structure (40 sections, 9 equations, 18 figures, 6 tables)

This paper contains 40 sections, 9 equations, 18 figures, 6 tables.

Figures (18)

  • Figure 1: An overview of recent developments: (a) model extraction attack on EaaS, (b) EmbMarker watermarking approach, and contributions from this work: (c)CSE attack and (d)WARDEN defense. CSE attack effectively eliminates the watermark (in Red) injected by EmbMarker, as shown in part (c). Whereas, WARDEN adds multiple watermarks (in Red, Blue, and Purple), where some of them (Blue and Purple in verification embedding) are missed by CSE attack, as illustrated in part (d).
  • Figure 2: The outline of our proposed CSE, consisting of three incremental steps: (i) clustering, (ii) selection, and (iii) elimination. More details are elaborated in Section \ref{['sec:attack-framework']}.
  • Figure 3: t-SNE t-SNE visualisation for K-Means clustering ($n=3$) of MIND dataset, discussed in Section \ref{['cse-clustering']}. Please refer to Appendix \ref{['appendix:num-clusters-cse']} for plots of other datasets.
  • Figure 4: Similarity distribution plot between the target embedding and various embedding types. As we can see, the suspected embeddings returned by the selection module in CSE are distinctly different from unsuspected embeddings and more akin to the target embedding. The results for other datasets are reported in Appendix \ref{['appendix:other-cos-sim-dist-plots']}.
  • Figure 5: The impact of the number of watermarks ($R$) in WARDEN for SST2 dataset.
  • ...and 13 more figures