Table of Contents
Fetching ...

Harmless Backdoor-based Client-side Watermarking in Federated Learning

Kaijing Luo, Ka-Ho Chow

TL;DR

Federated Learning raises IP protection concerns when clients watermark collaboratively trained models using backdoors, risking watermark collisions and malicious misuse. Sanitizer offers a server-side pipeline that identifies a compact backdoor subnet per client, performs round-spread reverse engineering to recover triggers, prunes backdoor effects, conducts harmless relearning in client-specific benign input subspaces, and enables verification with high reliability. The approach delivers near-perfect watermark verification, dramatically reduces malicious exploitation, and achieves substantial efficiency gains (lower GPU memory and faster per-round processing) while preserving main task accuracy. Its architecture-agnostic design and strong performance under non-IID data and conflicting triggers suggest practical viability for scalable IP protection in real-world FL deployments.

Abstract

Protecting intellectual property (IP) in federated learning (FL) is increasingly important as clients contribute proprietary data to collaboratively train models. Model watermarking, particularly through backdoor-based methods, has emerged as a popular approach for verifying ownership and contributions in deep neural networks trained via FL. By manipulating their datasets, clients can embed a secret pattern, resulting in non-intuitive predictions that serve as proof of participation, useful for claiming incentives or IP co-ownership. However, this technique faces practical challenges: (i) client watermarks can collide, leading to ambiguous ownership claims, and (ii) malicious clients may exploit watermarks to manipulate model predictions for harmful purposes. To address these issues, we propose Sanitizer, a server-side method that ensures client-embedded backdoors can only be activated in harmless environments but not natural queries. It identifies subnets within client-submitted models, extracts backdoors throughout the FL process, and confines them to harmless, client-specific input subspaces. This approach not only enhances Sanitizer's efficiency but also resolves conflicts when clients use similar triggers with different target labels. Our empirical results demonstrate that Sanitizer achieves near-perfect success verifying client contributions while mitigating the risks of malicious watermark use. Additionally, it reduces GPU memory consumption by 85% and cuts processing time by at least 5x compared to the baseline. Our code is open-sourced at https://hku-tasr.github.io/Sanitizer/.

Harmless Backdoor-based Client-side Watermarking in Federated Learning

TL;DR

Federated Learning raises IP protection concerns when clients watermark collaboratively trained models using backdoors, risking watermark collisions and malicious misuse. Sanitizer offers a server-side pipeline that identifies a compact backdoor subnet per client, performs round-spread reverse engineering to recover triggers, prunes backdoor effects, conducts harmless relearning in client-specific benign input subspaces, and enables verification with high reliability. The approach delivers near-perfect watermark verification, dramatically reduces malicious exploitation, and achieves substantial efficiency gains (lower GPU memory and faster per-round processing) while preserving main task accuracy. Its architecture-agnostic design and strong performance under non-IID data and conflicting triggers suggest practical viability for scalable IP protection in real-world FL deployments.

Abstract

Protecting intellectual property (IP) in federated learning (FL) is increasingly important as clients contribute proprietary data to collaboratively train models. Model watermarking, particularly through backdoor-based methods, has emerged as a popular approach for verifying ownership and contributions in deep neural networks trained via FL. By manipulating their datasets, clients can embed a secret pattern, resulting in non-intuitive predictions that serve as proof of participation, useful for claiming incentives or IP co-ownership. However, this technique faces practical challenges: (i) client watermarks can collide, leading to ambiguous ownership claims, and (ii) malicious clients may exploit watermarks to manipulate model predictions for harmful purposes. To address these issues, we propose Sanitizer, a server-side method that ensures client-embedded backdoors can only be activated in harmless environments but not natural queries. It identifies subnets within client-submitted models, extracts backdoors throughout the FL process, and confines them to harmless, client-specific input subspaces. This approach not only enhances Sanitizer's efficiency but also resolves conflicts when clients use similar triggers with different target labels. Our empirical results demonstrate that Sanitizer achieves near-perfect success verifying client contributions while mitigating the risks of malicious watermark use. Additionally, it reduces GPU memory consumption by 85% and cuts processing time by at least 5x compared to the baseline. Our code is open-sourced at https://hku-tasr.github.io/Sanitizer/.

Paper Structure

This paper contains 38 sections, 7 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Without Sanitizer, watermark collision may occur when two clients (e.g., Jack and Sam) use a similar trigger with different target labels. Furthermore, an adversary (e.g., Bob) can control the model to mispredict a query attached with a special trigger, originally used as a watermark, for malicious purposes (e.g., "Stop" becomes "Ahead Only").
  • Figure 2: With Sanizier, triggers become ineffective when placed on natural images (e.g., Bob). They do not suffer from watermark collision and can only lead to non-intuitive predictions when used in client-specific harmless environments (e.g., Jack and Sam).
  • Figure 3: Sanitizer offers significantly better scalability. It keeps the total server-side time consumption consistently lower (green) than the baseline (red) as the number of participating clients increases.
  • Figure 4: Examples of harmful (natural) inputs compared to the harmless (artificial) inputs of each client (e.g., Bob, Sam, and Jack).
  • Figure 5: Overview of Sanitizer pipeline. Sanitizer introduces three key enhancements on the server-side during the FL process: ① Backdoor Subnet Identification and ② Extraction, ③ Round-spread Trigger Recovery, and ④ Pruning and Aggregation for the next round. After the FL process, ⑤ Harmless Relearning ensures that the resultant FL-trained model is embedded with harmless watermarks, making it ready for deployment.
  • ...and 5 more figures