Table of Contents
Fetching ...

Towards Robust Federated Analytics via Differentially Private Measurements of Statistical Heterogeneity

Mary Scott, Graham Cormode, Carsten Maple

TL;DR

The paper tackles the problem of privately measuring statistical heterogeneity (SH) in federated analytics (FA) under differential privacy (DP). It introduces the Analytic Gaussian Mechanism (AGM) to optimally calibrate Gaussian noise for three SH measures—dispersion, $Q$, and $I^{2}$—and derives their mean-squared error and 95% confidence intervals from first principles. The authors demonstrate that the AGM, particularly in a distributed setting with secure aggregation, delivers superior accuracy compared to the classical Gaussian mechanism and centralized approaches, even under substantial SH. Experiments on CIFAR-10, CIFAR-100, and Fashion-MNIST show AGM achieves significantly lower MSEs, with $I^{2}$ typically offering the best sensitivity to SH while remaining robust to privacy budgets and data heterogeneity.

Abstract

Statistical heterogeneity is a measure of how skewed the samples of a dataset are. It is a common problem in the study of differential privacy that the usage of a statistically heterogeneous dataset results in a significant loss of accuracy. In federated scenarios, statistical heterogeneity is more likely to happen, and so the above problem is even more pressing. We explore the three most promising ways to measure statistical heterogeneity and give formulae for their accuracy, while simultaneously incorporating differential privacy. We find the optimum privacy parameters via an analytic mechanism, which incorporates root finding methods. We validate the main theorems and related hypotheses experimentally, and test the robustness of the analytic mechanism to different heterogeneity levels. The analytic mechanism in a distributed setting delivers superior accuracy to all combinations involving the classic mechanism and/or the centralized setting. All measures of statistical heterogeneity do not lose significant accuracy when a heterogeneous sample is used.

Towards Robust Federated Analytics via Differentially Private Measurements of Statistical Heterogeneity

TL;DR

The paper tackles the problem of privately measuring statistical heterogeneity (SH) in federated analytics (FA) under differential privacy (DP). It introduces the Analytic Gaussian Mechanism (AGM) to optimally calibrate Gaussian noise for three SH measures—dispersion, , and —and derives their mean-squared error and 95% confidence intervals from first principles. The authors demonstrate that the AGM, particularly in a distributed setting with secure aggregation, delivers superior accuracy compared to the classical Gaussian mechanism and centralized approaches, even under substantial SH. Experiments on CIFAR-10, CIFAR-100, and Fashion-MNIST show AGM achieves significantly lower MSEs, with typically offering the best sensitivity to SH while remaining robust to privacy budgets and data heterogeneity.

Abstract

Statistical heterogeneity is a measure of how skewed the samples of a dataset are. It is a common problem in the study of differential privacy that the usage of a statistically heterogeneous dataset results in a significant loss of accuracy. In federated scenarios, statistical heterogeneity is more likely to happen, and so the above problem is even more pressing. We explore the three most promising ways to measure statistical heterogeneity and give formulae for their accuracy, while simultaneously incorporating differential privacy. We find the optimum privacy parameters via an analytic mechanism, which incorporates root finding methods. We validate the main theorems and related hypotheses experimentally, and test the robustness of the analytic mechanism to different heterogeneity levels. The analytic mechanism in a distributed setting delivers superior accuracy to all combinations involving the classic mechanism and/or the centralized setting. All measures of statistical heterogeneity do not lose significant accuracy when a heterogeneous sample is used.

Paper Structure

This paper contains 17 sections, 24 theorems, 29 equations, 1 figure, 6 tables, 1 algorithm.

Key Result

Theorem 3.8

(Classical Gaussian Mechanism (CGM). For any $\varepsilon$, $\delta \in (0, 1)$, the Gaussian output perturbation mechanism with $\Delta = \sqrt{d}/n$ and $\sigma = \Delta \sqrt{2 \log(1.25/\delta)}/\varepsilon$ is ($\varepsilon, \delta$)-differentially private balleimproving.

Figures (1)

  • Figure 1: Effect of privacy parameter $\varepsilon$ on EMSE of AGM, depending on number of labels and statistical heterogeneity.

Theorems & Definitions (32)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Definition 3.4
  • Definition 3.5
  • Definition 3.6
  • Definition 3.7
  • Theorem 3.8
  • Theorem 3.9
  • Theorem 3.11
  • ...and 22 more