Table of Contents
Fetching ...

Optimal Private and Communication Constraint Distributed Goodness-of-Fit Testing for Discrete Distributions in the Large Sample Regime

Lasse Vuursteen

Abstract

We study distributed goodness-of-fit testing for discrete distribution under bandwidth and differential privacy constraints. Information constraint distributed goodness-of-fit testing is a problem that has received considerable attention recently. The important case of discrete distributions is theoretically well understood in the classical case where all data is available in one "central" location. In a federated setting, however, data is distributed across multiple "locations" (e.g. servers) and cannot readily be shared due to e.g. bandwidth or privacy constraints that each server needs to satisfy. We show how recently derived results for goodness-of-fit testing for the mean of a multivariate Gaussian model extend to the discrete distributions, by leveraging Le Cam's theory of statistical equivalence. In doing so, we derive matching minimax upper- and lower-bounds for the goodness-of-fit testing for discrete distributions under bandwidth or privacy constraints in the regime where the number of samples held locally is large.

Optimal Private and Communication Constraint Distributed Goodness-of-Fit Testing for Discrete Distributions in the Large Sample Regime

Abstract

We study distributed goodness-of-fit testing for discrete distribution under bandwidth and differential privacy constraints. Information constraint distributed goodness-of-fit testing is a problem that has received considerable attention recently. The important case of discrete distributions is theoretically well understood in the classical case where all data is available in one "central" location. In a federated setting, however, data is distributed across multiple "locations" (e.g. servers) and cannot readily be shared due to e.g. bandwidth or privacy constraints that each server needs to satisfy. We show how recently derived results for goodness-of-fit testing for the mean of a multivariate Gaussian model extend to the discrete distributions, by leveraging Le Cam's theory of statistical equivalence. In doing so, we derive matching minimax upper- and lower-bounds for the goodness-of-fit testing for discrete distributions under bandwidth or privacy constraints in the regime where the number of samples held locally is large.

Paper Structure

This paper contains 17 sections, 21 theorems, 119 equations.

Key Result

Theorem 1

Consider sequences $m \equiv m_\nu$, $b \equiv b_\nu$, $d \equiv d_\nu$ and $n \equiv n_\nu$ such that $md \to \infty$ whilst Suppose that $\rho \equiv \rho_\nu$ is a nonnegative sequence satisfying Then, When considering the class of only local randomness protocols (i.e. replacing $\mathscr{T}_{\text{SR}}^{(b)}$ with $\mathscr{T}_{\text{LR}}^{(b)}$ in the above display), the minimax separation

Theorems & Definitions (39)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Theorem 2
  • Lemma 1
  • Theorem 3
  • Lemma 2
  • Lemma 3
  • proof
  • ...and 29 more