Collaborative Threshold Watermarking
Tameem Bakr, Anish Ambreth, Nils Lukas
TL;DR
This work tackles provenance in federated learning by introducing a scalable $(t,K)$-threshold watermarking scheme that enables coalitions of at least $t$ clients to verify a watermark without revealing the secret key $\tau$. Watermark embedding is distributed via Shamir secret sharing and secure aggregation, and verification uses a reconstruction-free inner-product approach with a calibrated $z$-test. Empirical results on CIFAR-10/100 and Tiny ImageNet show robust detectability up to $K=128$ with minimal accuracy loss and resilience to post-training attacks (including adaptive fine-tuning, pruning, and quantization) up to 20% data; distillation remains the most effective removal but is costly. The method scales to large FL deployments and supports shared ownership, improving model provenance in collaborative environments. Overall, threshold watermarking provides practical, robust ownership attribution for jointly trained models in distributed, possibly untrusted, settings.
Abstract
In federated learning (FL), $K$ clients jointly train a model without sharing raw data. Because each participant invests data and compute, clients need mechanisms to later prove the provenance of a jointly trained model. Model watermarking embeds a hidden signal in the weights, but naive approaches either do not scale with many clients as per-client watermarks dilute as $K$ grows, or give any individual client the ability to verify and potentially remove the watermark. We introduce $(t,K)$-threshold watermarking: clients collaboratively embed a shared watermark during training, while only coalitions of at least $t$ clients can reconstruct the watermark key and verify a suspect model. We secret-share the watermark key $τ$ so that coalitions of fewer than $t$ clients cannot reconstruct it, and verification can be performed without revealing $τ$ in the clear. We instantiate our protocol in the white-box setting and evaluate on image classification. Our watermark remains detectable at scale ($K=128$) with minimal accuracy loss and stays above the detection threshold ($z\ge 4$) under attacks including adaptive fine-tuning using up to 20% of the training data.
