Table of Contents
Fetching ...

Your Semantic-Independent Watermark is Fragile: A Semantic Perturbation Attack against EaaS Watermark

Zekun Fei, Biao Yi, Jianing Geng, Ruiqi He, Lihai Nie, Zheli Liu

TL;DR

The paper investigates copyright protection for EaaS via backdoor-based watermarks and reveals a critical flaw: semantic-independence of watermark signals enables adaptive semantic perturbations to bypass verification. It introduces Semantic Perturbation Attack (SPA), a black-box method that leverages suffix-based perturbations and embedding-tightness metrics (Cosine, $L_2$, PCA) to reliably identify and remove watermarked samples while preserving utility, achieving a reported $TPR>0.95$. Extensive experiments across four datasets and multiple watermark schemes show SPA's effectiveness against single- and multi-watermark defenses, highlighting gaps in current watermark design. The work underscores the importance of incorporating semantic awareness into watermarking and proposes strategies for defense and future robust watermarks with broader impact on EaaS security and IP protection.

Abstract

Embedding-as-a-Service (EaaS) has emerged as a successful business pattern but faces significant challenges related to various forms of copyright infringement, particularly, the API misuse and model extraction attacks. Various studies have proposed backdoor-based watermarking schemes to protect the copyright of EaaS services. In this paper, we reveal that previous watermarking schemes possess semantic-independent characteristics and propose the Semantic Perturbation Attack (SPA). Our theoretical and experimental analysis demonstrate that this semantic-independent nature makes current watermarking schemes vulnerable to adaptive attacks that exploit semantic perturbations tests to bypass watermark verification. Extensive experimental results across multiple datasets demonstrate that the True Positive Rate (TPR) for identifying watermarked samples under SPA can reach up to more than 95\%, rendering watermarks ineffective while maintaining the high utility of embeddings. Furthermore, we discuss potential defense strategies to mitigate SPA. Our code is available at https://github.com/Zk4-ps/EaaS-Embedding-Watermark.

Your Semantic-Independent Watermark is Fragile: A Semantic Perturbation Attack against EaaS Watermark

TL;DR

The paper investigates copyright protection for EaaS via backdoor-based watermarks and reveals a critical flaw: semantic-independence of watermark signals enables adaptive semantic perturbations to bypass verification. It introduces Semantic Perturbation Attack (SPA), a black-box method that leverages suffix-based perturbations and embedding-tightness metrics (Cosine, , PCA) to reliably identify and remove watermarked samples while preserving utility, achieving a reported . Extensive experiments across four datasets and multiple watermark schemes show SPA's effectiveness against single- and multi-watermark defenses, highlighting gaps in current watermark design. The work underscores the importance of incorporating semantic awareness into watermarking and proposes strategies for defense and future robust watermarks with broader impact on EaaS security and IP protection.

Abstract

Embedding-as-a-Service (EaaS) has emerged as a successful business pattern but faces significant challenges related to various forms of copyright infringement, particularly, the API misuse and model extraction attacks. Various studies have proposed backdoor-based watermarking schemes to protect the copyright of EaaS services. In this paper, we reveal that previous watermarking schemes possess semantic-independent characteristics and propose the Semantic Perturbation Attack (SPA). Our theoretical and experimental analysis demonstrate that this semantic-independent nature makes current watermarking schemes vulnerable to adaptive attacks that exploit semantic perturbations tests to bypass watermark verification. Extensive experimental results across multiple datasets demonstrate that the True Positive Rate (TPR) for identifying watermarked samples under SPA can reach up to more than 95\%, rendering watermarks ineffective while maintaining the high utility of embeddings. Furthermore, we discuss potential defense strategies to mitigate SPA. Our code is available at https://github.com/Zk4-ps/EaaS-Embedding-Watermark.

Paper Structure

This paper contains 28 sections, 4 equations, 9 figures, 3 tables, 2 algorithms.

Figures (9)

  • Figure 1: An Overview of EaaS Watermark.
  • Figure 2: Semantic Perturbation Demonstration in 2D Space. When the perturbed angle reaches $180^\circ$, this $\theta_1 < \theta_2$ relationship holds for any watermark vector.
  • Figure 3: The Framework of Semantic Perturbation Attack. Attackers apply the semantic perturbation strategy to modify the original query dataset. The semantic-independent characteristic enables the selection and deletion of watermarked embeddings, ultimately resulting in a purified dataset that bypasses watermark verification.
  • Figure 4: PCA Score Visualization. Significant distribution shift of the eigenvalues can be observed.
  • Figure 5: Threshold Selection. Our semantic perturbation strategy induces a bimodal distribution in the PCA score distribution.
  • ...and 4 more figures