Distributed, communication-efficient, and differentially private estimation of KL divergence
Mary Scott, Sayan Biswas, Graham Cormode, Carsten Maple
TL;DR
This paper addresses measuring distribution drift in privacy-sensitive federated data by estimating the KL divergence $D_{KL}(\\Pi \\| P)$ between a reference distribution and a private data distribution. It introduces PRIEST-KLD, a family of three DP-enabled estimators that trade off trust, communication, and accuracy across Fully Trusted, TAgg, and Fully Distributed models, using Monte Carlo subsampling and secure aggregation. The estimators are shown to be unbiased in the Trusted and TAgg models, with DP noise added to guarantee $(\\varepsilon,\\delta)$-DP, while the Dist model relies on local DP noise and yields a non-unbiased estimator but strong privacy and low communication. Empirical results on FEMNIST demonstrate that the Dist model often achieves the best privacy-utility trade-off, guiding parameter choices such as the noise level via $\\varepsilon$, the sampling rate via $|C_t|$, and the variance-control parameter via $\\lambda$ for practical deployment.
Abstract
A key task in managing distributed, sensitive data is to measure the extent to which a distribution changes. Understanding this drift can effectively support a variety of federated learning and analytics tasks. However, in many practical settings sharing such information can be undesirable (e.g., for privacy concerns) or infeasible (e.g., for high communication costs). In this work, we describe novel algorithmic approaches for estimating the KL divergence of data across federated models of computation, under differential privacy. We analyze their theoretical properties and present an empirical study of their performance. We explore parameter settings that optimize the accuracy of the algorithm catering to each of the settings; these provide sub-variations that are applicable to real-world tasks, addressing different context- and application-specific trust level requirements. Our experimental results confirm that our private estimators achieve accuracy comparable to a baseline algorithm without differential privacy guarantees.
