Table of Contents
Fetching ...

Samplable Anonymous Aggregation for Private Federated Data Analysis

Kunal Talwar, Shan Wang, Audra McMillan, Vojta Jina, Vitaly Feldman, Pansy Bansal, Bailey Basile, Aine Cahill, Yi Sheng Chan, Mike Chatzidakis, Junye Chen, Oliver Chick, Mona Chitnis, Suman Ganta, Yusuf Goren, Filip Granqvist, Kristine Guo, Frederic Jacobs, Omid Javidbakht, Albert Liu, Richard Low, Dan Mascenik, Steve Myers, David Park, Wonhee Park, Gianni Parsa, Tommy Pauly, Christian Priebe, Rehan Rishi, Guy Rothblum, Michael Scaria, Linmao Song, Congzheng Song, Karl Tarbe, Sebastian Vogt, Luke Winstrom, Shundong Zhou

TL;DR

Samplable Anonymous Aggregation (SA_2) addresses the privacy-utility gap in private federated data analysis by enabling aggregates over random subsets with anonymity to achieve central-like DP guarantees without a single trusted curator. The authors propose a two-server architecture based on additive secret sharing (Prio) augmented with on-device sampling, anonymization, and rate-limited anonymous authentication to realize SA_2. They provide formal privacy and utility analyses, showing near-central DP guarantees for histograms and strong DP bounds for private federated learning, along with a security-focused architecture description and hardening strategies. The work also discusses trade-offs between federated statistics and learning, trust models, and potential extensions to broader classes of analyses.

Abstract

We revisit the problem of designing scalable protocols for private statistics and private federated learning when each device holds its private data. Locally differentially private algorithms require little trust but are (provably) limited in their utility. Centrally differentially private algorithms can allow significantly better utility but require a trusted curator. This gap has led to significant interest in the design and implementation of simple cryptographic primitives, that can allow central-like utility guarantees without having to trust a central server. Our first contribution is to propose a new primitive that allows for efficient implementation of several commonly used algorithms, and allows for privacy accounting that is close to that in the central setting without requiring the strong trust assumptions it entails. {\em Shuffling} and {\em aggregation} primitives that have been proposed in earlier works enable this for some algorithms, but have significant limitations as primitives. We propose a {\em Samplable Anonymous Aggregation} primitive, which computes an aggregate over a random subset of the inputs and show that it leads to better privacy-utility trade-offs for various fundamental tasks. Secondly, we propose a system architecture that implements this primitive and perform a security analysis of the proposed system. Our design combines additive secret-sharing with anonymization and authentication infrastructures.

Samplable Anonymous Aggregation for Private Federated Data Analysis

TL;DR

Samplable Anonymous Aggregation (SA_2) addresses the privacy-utility gap in private federated data analysis by enabling aggregates over random subsets with anonymity to achieve central-like DP guarantees without a single trusted curator. The authors propose a two-server architecture based on additive secret sharing (Prio) augmented with on-device sampling, anonymization, and rate-limited anonymous authentication to realize SA_2. They provide formal privacy and utility analyses, showing near-central DP guarantees for histograms and strong DP bounds for private federated learning, along with a security-focused architecture description and hardening strategies. The work also discusses trade-offs between federated statistics and learning, trust models, and potential extensions to broader classes of analyses.

Abstract

We revisit the problem of designing scalable protocols for private statistics and private federated learning when each device holds its private data. Locally differentially private algorithms require little trust but are (provably) limited in their utility. Centrally differentially private algorithms can allow significantly better utility but require a trusted curator. This gap has led to significant interest in the design and implementation of simple cryptographic primitives, that can allow central-like utility guarantees without having to trust a central server. Our first contribution is to propose a new primitive that allows for efficient implementation of several commonly used algorithms, and allows for privacy accounting that is close to that in the central setting without requiring the strong trust assumptions it entails. {\em Shuffling} and {\em aggregation} primitives that have been proposed in earlier works enable this for some algorithms, but have significant limitations as primitives. We propose a {\em Samplable Anonymous Aggregation} primitive, which computes an aggregate over a random subset of the inputs and show that it leads to better privacy-utility trade-offs for various fundamental tasks. Secondly, we propose a system architecture that implements this primitive and perform a security analysis of the proposed system. Our design combines additive secret-sharing with anonymization and authentication infrastructures.
Paper Structure (28 sections, 5 theorems, 8 equations, 8 figures)

This paper contains 28 sections, 5 theorems, 8 equations, 8 figures.

Key Result

Theorem 1

Bun:2016mironov2017renyiCanonneKS20 If $\mathcal{A}:\mathcal{D}^n\to\mathcal{S}$ is $\varepsilon$-DP, then for any $\alpha>1$, $\mathcal{A}$ satisfies $(\alpha, \frac{1}{2}\varepsilon^2\alpha)$-RDP. Conversely, for any $\delta\in(0,1]$, if $\mathcal{A}$ is $(\alpha, \varepsilon)$-RDP then it is $(\v

Figures (8)

  • Figure 1: Expected Squared Error of a non-private baseline ( NonPriv), Aggregation model ( Agg) and Samplable Aggregation ( SampAgg) on a histogram task, for a histogram on uniform ground truth distribution (Left), and Skewed ground truth (Middle, Right), with data-dependent sampling (also showing non-private Importance Sampling). The population size ($N$), the support size ($K$), the privacy parameter ($\varepsilon$), and the fraction of non-default values ($\gamma$) are shown. We plot the Variance (expected squared error) of the algorithm agains the number of devices sampled $M$. More details and additional plots are in \ref{['sec:experiments']}.
  • Figure 2: Expected Squared Error of a non-private baseline ( NonPriv), Aggregation model ( Agg) and Samplable Aggregation ( SampAgg) on a histogram task, for varying values of tasks $T$ for a fixed total privacy budget.
  • Figure 3: Expected Squared Error on the distribution of non-zero values for a sparse histogram task, for varying parameter values. The plots include a naive non-private baseline ( NonPrivUnif), non-private Importance Sampling ( NonPrivImpSamp), Aggregation model ( Agg) and Samplable Aggregation ( SampAgg) for varying number of tasks $T$.
  • Figure 4: Schematic Description of the communication pattern in the proposed protocol. The Client downloads a recipe ((Step ①) and gets a rate-limited anonymous attestation token $\tau$ from the Rate-limited Attestation Service (Step ②). It uses (Step ③) an Anonymization Service to send secret shares (and proof shares) $M_1$ and $M_2$ to the Leader and Helper respectively, each encrypted using their respective encryption keys (Step ⑤). The Leader and Helper run a protocol to validate the contributions((Step ⑤), and compute the aggregate over the valid shares (Step ⑥).
  • Figure 5: Expected Squared Error of a non-private baseline ( NonPriv), Aggregation model ( Agg) and Samplable Aggregation ( SampAgg) on a histogram task, for varying values of vocabulary size $K$ and tasks $T$ for a fixed total privacy budget.
  • ...and 3 more figures

Theorems & Definitions (15)

  • Definition 2.1: Hockey-stick divergence
  • Definition 2.2: Rényi divergence
  • Definition 2.3: Central DP
  • Theorem 1
  • Theorem 2: RDP Composition mironov2017renyi
  • Theorem 3: Advanced Composition DRV10
  • Theorem 4
  • Example 1: Gaussian and Subsampled Gaussian Mechanisms
  • Definition 2.4: Local randomizer
  • Example 2: Rappor$_{K}$.
  • ...and 5 more