Table of Contents
Fetching ...

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Haodong Zhao, Jinming Hu, Yijie Bai, Tian Dong, Wei Du, Zhuosheng Zhang, Yanjiao Chen, Haojin Zhu, Gongshen Liu

Abstract

Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional model instance. Current watermarking schemes for FedLM often require white-box access and client-side cooperation, providing only group-level proof of ownership rather than individual traceability. We propose EmbTracker, a server-side, traceable black-box watermarking framework specifically designed for FedLMs. EmbTracker achieves black-box verifiability by embedding a backdoor-based watermark detectable through simple API queries. Client-level traceability is realized by injecting unique identity-specific watermarks into the model distributed to each client. In this way, a leaked model can be attributed to a specific culprit, ensuring robustness even against non-cooperative participants. Extensive experiments on various language and vision-language models demonstrate that EmbTracker achieves robust traceability with verification rates near 100\%, high resilience against removal attacks (fine-tuning, pruning, quantization), and negligible impact on primary task performance (typically within 1-2\%).

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Abstract

Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional model instance. Current watermarking schemes for FedLM often require white-box access and client-side cooperation, providing only group-level proof of ownership rather than individual traceability. We propose EmbTracker, a server-side, traceable black-box watermarking framework specifically designed for FedLMs. EmbTracker achieves black-box verifiability by embedding a backdoor-based watermark detectable through simple API queries. Client-level traceability is realized by injecting unique identity-specific watermarks into the model distributed to each client. In this way, a leaked model can be attributed to a specific culprit, ensuring robustness even against non-cooperative participants. Extensive experiments on various language and vision-language models demonstrate that EmbTracker achieves robust traceability with verification rates near 100\%, high resilience against removal attacks (fine-tuning, pruning, quantization), and negligible impact on primary task performance (typically within 1-2\%).
Paper Structure (29 sections, 4 equations, 11 figures, 7 tables)

This paper contains 29 sections, 4 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Illustration of the risk of client model leakage in federated language model training. Since all clients in FL can obtain the same global model, traditional watermarks cannot distinguish the source of the leak. $\mathsf{EmbTracker}$ creates a unique watermark for each client through the server, which can accurately track the model leaker.
  • Figure 2: The overall process of $\mathsf{EmbTracker}$. (i) Trigger generation, identity information is used to generate $sig$ for each client. (ii) Watermark injection, watermarked model $M_{w}$ is trained on server and distributed. (iii) Watermark verification, only samples with client-specific triggers can pass the verification.
  • Figure 3: The workflow of the proposed watermark injection process. Step 1: The server uses a universal trigger ($Tr_u$) to train a universal watermark embedding vector ($W_w$), updating only the trigger token embeddings. Step 2: The server replaces embedding vector of client-specific triggers with $W_w$, ensuring each client receives a distinct watermark. Step 3: Clients perform local training on their private data using PEFT methods. Step 4: The server collects the updated PEFT modules from clients, aggregates, performs watermark enhancement training, and distributes the enhanced PEFT modules.
  • Figure 4: Illustration of the verification interval (VI). The consistently large VI reflects the effectiveness and reliability of the watermarking scheme in accurately attributing model ownership while minimizing watermark collisions.
  • Figure 5: Evaluation of the applicability of $\mathsf{EmbTracker}$ across different models. The results demonstrate that $\mathsf{EmbTracker}$ consistently achieves high watermark VRs and maintains robust performance on the primary ACCs, regardless of the underlying language model.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Definition 1: Traceability
  • Definition 2: Collison