One-shot Empirical Privacy Estimation for Federated Learning
Galen Andrew, Peter Kairouz, Sewoong Oh, Alina Oprea, H. Brendan McMahan, Vinith M. Suriyakumar
TL;DR
This work tackles the problem of efficiently auditing the privacy loss of federated learning models without retraining or task-specific assumptions. It introduces a one-shot auditing framework that inserts random canaries and uses cosine-based test statistics to estimate the DP parameter $\varepsilon$ under the Gaussian mechanism, proving asymptotic correctness in high dimensions. The method extends to FL by injecting canaries as clients and evaluating the final model leakage, demonstrating that final-model privacy can be substantially better than what would be inferred from observing all intermediate updates, while maintaining negligible impact on utility. Empirically, the approach is validated on large-scale FL benchmarks (e.g., StackOverflow and EMNIST), compares favorably to CANIFE, and provides a practical, scalable tool for production FL privacy assessment with broad applicability across architectures and participation patterns.
Abstract
Privacy estimation techniques for differentially private (DP) algorithms are useful for comparing against analytical bounds, or to empirically measure privacy loss in settings where known analytical bounds are not tight. However, existing privacy auditing techniques usually make strong assumptions on the adversary (e.g., knowledge of intermediate model iterates or the training data distribution), are tailored to specific tasks, model architectures, or DP algorithm, and/or require retraining the model many times (typically on the order of thousands). These shortcomings make deploying such techniques at scale difficult in practice, especially in federated settings where model training can take days or weeks. In this work, we present a novel "one-shot" approach that can systematically address these challenges, allowing efficient auditing or estimation of the privacy loss of a model during the same, single training run used to fit model parameters, and without requiring any a priori knowledge about the model architecture, task, or DP training algorithm. We show that our method provides provably correct estimates for the privacy loss under the Gaussian mechanism, and we demonstrate its performance on well-established FL benchmark datasets under several adversarial threat models.
