Table of Contents
Fetching ...

Distributed HDMM: Scalable, Distributed, Accurate, and Differentially Private Query Workloads without a Trusted Curator

Ratang Sedimo, Ivoline C. Ngong, Jami Lashua, Joseph P. Near

TL;DR

<3-5 sentence high-level summary>Distributed HDMM addresses the challenge of answering high-dimensional query workloads under differential privacy without a trusted curator by combining the central-model High-Dimensional Matrix Mechanism (HDMM) with secure aggregation. The method broadcasts an optimized strategy matrix, enables each client to compute a local HDMM measurement, adds discrete Gaussian noise, and securely aggregates these contributions to reconstruct the private workload answers, preserving privacy under semi-honest and malicious threat models. The authors provide formal privacy guarantees, analyze computation and communication costs, and demonstrate scalability to thousands of clients with utility close to central HDMM, outperforming local and shuffle-based baselines. The work also includes an open-source implementation and extensive evaluation on Census SF1 and Adult workloads, highlighting practical deployment considerations for federated settings.

Abstract

We present the Distributed High-Dimensional Matrix Mechanism (Distributed HDMM), a protocol for answering workloads of linear queries on distributed data that provides the accuracy of central-model HDMM without a trusted curator. Distributed HDMM leverages a secure aggregation protocol to evaluate HDMM on distributed data, and is secure in the context of a malicious aggregator and malicious clients (assuming an honest majority). Our preliminary empirical evaluation shows that Distributed HDMM can run on realistic datasets and workloads with thousands of clients in less than one minute.

Distributed HDMM: Scalable, Distributed, Accurate, and Differentially Private Query Workloads without a Trusted Curator

TL;DR

<3-5 sentence high-level summary>Distributed HDMM addresses the challenge of answering high-dimensional query workloads under differential privacy without a trusted curator by combining the central-model High-Dimensional Matrix Mechanism (HDMM) with secure aggregation. The method broadcasts an optimized strategy matrix, enables each client to compute a local HDMM measurement, adds discrete Gaussian noise, and securely aggregates these contributions to reconstruct the private workload answers, preserving privacy under semi-honest and malicious threat models. The authors provide formal privacy guarantees, analyze computation and communication costs, and demonstrate scalability to thousands of clients with utility close to central HDMM, outperforming local and shuffle-based baselines. The work also includes an open-source implementation and extensive evaluation on Census SF1 and Adult workloads, highlighting practical deployment considerations for federated settings.

Abstract

We present the Distributed High-Dimensional Matrix Mechanism (Distributed HDMM), a protocol for answering workloads of linear queries on distributed data that provides the accuracy of central-model HDMM without a trusted curator. Distributed HDMM leverages a secure aggregation protocol to evaluate HDMM on distributed data, and is secure in the context of a malicious aggregator and malicious clients (assuming an honest majority). Our preliminary empirical evaluation shows that Distributed HDMM can run on realistic datasets and workloads with thousands of clients in less than one minute.

Paper Structure

This paper contains 46 sections, 7 theorems, 8 equations, 10 figures, 2 tables, 3 algorithms.

Key Result

proposition 1

The Gaussian mechanism satisfies $\rho$-zCDP.

Figures (10)

  • Figure 1: Overview of the Distributed HDMM approach.
  • Figure 2: Distributed HDMM functionality $\mathcal{F}_{\text{HDMM}}$.
  • Figure 3: Distributed HDMM Protocol $\prod_\text{HDMM}$.
  • Figure 4: Server and Client (Average) Computation Time (Bandwidth = Unlimited)
  • Figure 5: Total Protocol Running Time (s), semi-honest security
  • ...and 5 more figures

Theorems & Definitions (11)

  • Definition 1: Differential privacy
  • Definition 2: Zero-concentrated differential privacy (zCDP) bun2016concentrated
  • Definition 3: L$_2$-sensitivity
  • Definition 4: Gaussian mechanism for zCDP
  • proposition 1: Gaussian mechanism satisfies zCDP bun2016concentrated
  • lemma 1: Composition bun2016concentrated
  • lemma 2: Post-processing bun2016concentrated
  • proposition 2: Conversion to $(\varepsilon,\delta)$-DP bun2016concentrated
  • lemma 3: Distributed discrete Gaussian kairouz2021distributed
  • Theorem 1
  • ...and 1 more