Table of Contents
Fetching ...

Collaborative Threshold Watermarking

Tameem Bakr, Anish Ambreth, Nils Lukas

TL;DR

This work tackles provenance in federated learning by introducing a scalable $(t,K)$-threshold watermarking scheme that enables coalitions of at least $t$ clients to verify a watermark without revealing the secret key $\tau$. Watermark embedding is distributed via Shamir secret sharing and secure aggregation, and verification uses a reconstruction-free inner-product approach with a calibrated $z$-test. Empirical results on CIFAR-10/100 and Tiny ImageNet show robust detectability up to $K=128$ with minimal accuracy loss and resilience to post-training attacks (including adaptive fine-tuning, pruning, and quantization) up to 20% data; distillation remains the most effective removal but is costly. The method scales to large FL deployments and supports shared ownership, improving model provenance in collaborative environments. Overall, threshold watermarking provides practical, robust ownership attribution for jointly trained models in distributed, possibly untrusted, settings.

Abstract

In federated learning (FL), $K$ clients jointly train a model without sharing raw data. Because each participant invests data and compute, clients need mechanisms to later prove the provenance of a jointly trained model. Model watermarking embeds a hidden signal in the weights, but naive approaches either do not scale with many clients as per-client watermarks dilute as $K$ grows, or give any individual client the ability to verify and potentially remove the watermark. We introduce $(t,K)$-threshold watermarking: clients collaboratively embed a shared watermark during training, while only coalitions of at least $t$ clients can reconstruct the watermark key and verify a suspect model. We secret-share the watermark key $τ$ so that coalitions of fewer than $t$ clients cannot reconstruct it, and verification can be performed without revealing $τ$ in the clear. We instantiate our protocol in the white-box setting and evaluate on image classification. Our watermark remains detectable at scale ($K=128$) with minimal accuracy loss and stays above the detection threshold ($z\ge 4$) under attacks including adaptive fine-tuning using up to 20% of the training data.

Collaborative Threshold Watermarking

TL;DR

This work tackles provenance in federated learning by introducing a scalable -threshold watermarking scheme that enables coalitions of at least clients to verify a watermark without revealing the secret key . Watermark embedding is distributed via Shamir secret sharing and secure aggregation, and verification uses a reconstruction-free inner-product approach with a calibrated -test. Empirical results on CIFAR-10/100 and Tiny ImageNet show robust detectability up to with minimal accuracy loss and resilience to post-training attacks (including adaptive fine-tuning, pruning, and quantization) up to 20% data; distillation remains the most effective removal but is costly. The method scales to large FL deployments and supports shared ownership, improving model provenance in collaborative environments. Overall, threshold watermarking provides practical, robust ownership attribution for jointly trained models in distributed, possibly untrusted, settings.

Abstract

In federated learning (FL), clients jointly train a model without sharing raw data. Because each participant invests data and compute, clients need mechanisms to later prove the provenance of a jointly trained model. Model watermarking embeds a hidden signal in the weights, but naive approaches either do not scale with many clients as per-client watermarks dilute as grows, or give any individual client the ability to verify and potentially remove the watermark. We introduce -threshold watermarking: clients collaboratively embed a shared watermark during training, while only coalitions of at least clients can reconstruct the watermark key and verify a suspect model. We secret-share the watermark key so that coalitions of fewer than clients cannot reconstruct it, and verification can be performed without revealing in the clear. We instantiate our protocol in the white-box setting and evaluate on image classification. Our watermark remains detectable at scale () with minimal accuracy loss and stays above the detection threshold () under attacks including adaptive fine-tuning using up to 20% of the training data.
Paper Structure (47 sections, 12 equations, 12 figures, 2 tables, 3 algorithms)

This paper contains 47 sections, 12 equations, 12 figures, 2 tables, 3 algorithms.

Figures (12)

  • Figure 1: An overview of collaborative threshold watermarking. $\textsc{Setup}$ is a one-time procedure to distribute Shamir shares $s_i$ to all clients from which they derive additive shares $w_i$. $\textsc{Embed}$ modifies the FL algorithm to embed our watermark, $\textsc{Verify}$ allows any coalition of $\geq t$ clients to compute the watermark test statistic.
  • Figure 2: Combined cosine similarity distributions for different datasets on ResNet18 models, with fitted normal distributions.
  • Figure 3: (a) Our method sustains statistically significant $z$-scores up to $K=128$, whereas the baseline collapses beyond $K=16$. (b) Increasing the scaling factor $c$ consistently boosts $z$-scores across datasets, showing that watermark strength is tunable.
  • Figure 4: Robustness analysis on CIFAR-100 with $K=32$ and $c=0.025$. We report the trade-off between task accuracy and watermark $z$-score under five attack types: (i) adaptive fine-tuning, (ii) plain fine-tuning, (iii) knowledge distillation, (iv) pruning (magnitude and structured), and (v) quantization. The original model is shown as a star. Dashed curves denote Pareto frontiers for 1%, 5%, 10%, and 20% of the training data, while the red dashed line marks the detection threshold ($z=4$).
  • Figure 5: Per-client DKG protocol runtime versus number of participating clients under a 1000 Mbps network, showing total time and the breakdown between computation and communication.
  • ...and 7 more figures