Table of Contents
Fetching ...

Optimized Tradeoffs for Private Prediction with Majority Ensembling

Shuli Jiang, Qiuyi, Zhang, Gauri Joshi

TL;DR

The Data-dependent Randomized Response Majority algorithm is introduced, parameterized by a data-dependent noise function $\gamma$, and enables efficient utility optimization over the class of all private algorithms, encompassing those standard methods.

Abstract

We study a classical problem in private prediction, the problem of computing an $(mε, δ)$-differentially private majority of $K$ $(ε, Δ)$-differentially private algorithms for $1 \leq m \leq K$ and $1 > δ\geq Δ\geq 0$. Standard methods such as subsampling or randomized response are widely used, but do they provide optimal privacy-utility tradeoffs? To answer this, we introduce the Data-dependent Randomized Response Majority (DaRRM) algorithm. It is parameterized by a data-dependent noise function $γ$, and enables efficient utility optimization over the class of all private algorithms, encompassing those standard methods. We show that maximizing the utility of an $(mε, δ)$-private majority algorithm can be computed tractably through an optimization problem for any $m \leq K$ by a novel structural result that reduces the infinitely many privacy constraints into a polynomial set. In some settings, we show that DaRRM provably enjoys a privacy gain of a factor of 2 over common baselines, with fixed utility. Lastly, we demonstrate the strong empirical effectiveness of our first-of-its-kind privacy-constrained utility optimization for ensembling labels for private prediction from private teachers in image classification. Notably, our DaRRM framework with an optimized $γ$ exhibits substantial utility gains when compared against several baselines.

Optimized Tradeoffs for Private Prediction with Majority Ensembling

TL;DR

The Data-dependent Randomized Response Majority algorithm is introduced, parameterized by a data-dependent noise function , and enables efficient utility optimization over the class of all private algorithms, encompassing those standard methods.

Abstract

We study a classical problem in private prediction, the problem of computing an -differentially private majority of -differentially private algorithms for and . Standard methods such as subsampling or randomized response are widely used, but do they provide optimal privacy-utility tradeoffs? To answer this, we introduce the Data-dependent Randomized Response Majority (DaRRM) algorithm. It is parameterized by a data-dependent noise function , and enables efficient utility optimization over the class of all private algorithms, encompassing those standard methods. We show that maximizing the utility of an -private majority algorithm can be computed tractably through an optimization problem for any by a novel structural result that reduces the infinitely many privacy constraints into a polynomial set. In some settings, we show that DaRRM provably enjoys a privacy gain of a factor of 2 over common baselines, with fixed utility. Lastly, we demonstrate the strong empirical effectiveness of our first-of-its-kind privacy-constrained utility optimization for ensembling labels for private prediction from private teachers in image classification. Notably, our DaRRM framework with an optimized exhibits substantial utility gains when compared against several baselines.

Paper Structure

This paper contains 40 sections, 25 theorems, 140 equations, 10 figures, 10 tables, 6 algorithms.

Key Result

Theorem 2.2

For any ${\epsilon} > 0$ and $\delta \in [0,1]$, the class of $({\epsilon}, \delta)$-differentially private mechanisms satisfy $(k{\epsilon}, k\delta)$-differential privacy under $k$-fold adaptive composition.

Figures (10)

  • Figure 1: An illustration of the problem setting. The inputs are the dataset ${\mathcal{D}}$ and $K$$({\epsilon}, \Delta)$-differentially private mechanisms $M_1,\dots, M_K$. One draws samples $S_i \sim M_i({\mathcal{D}})$ and computes an aggregated output $g(S_1,\dots, S_K)$ based on all observed samples. Our goal is to design a randomized algorithm ${\mathcal{A}}$ that approximately computes $g$ and is $(m{\epsilon}, \delta)$-differentially private for $1\leq m \leq K$ and $\delta \geq \Delta \geq 0$. We focus on $g$ being the majority function .
  • Figure 2: Plots of the shape and ${\mathcal{E}}(\textsf{DaRRM}_{\gamma})$ of different $\gamma$ functions: the optimized $\gamma_{opt}$
  • Figure 3: A visualization of the above LP problem.
  • Figure 4: The feasible region ${\mathcal{F}}$ is plotted as the blue area. The four boundaries are implied by $p, p'$ satisfying ${\epsilon}$-differential privacy.
  • Figure 5: An illustration of the feasible region ${\mathcal{F}}_i$.
  • ...and 5 more figures

Theorems & Definitions (41)

  • Definition 2.1: Differential Privacy (DP) dwork2014algorithmic
  • Theorem 2.2: Simple Composition dwork2014algorithmic
  • Theorem 2.3: General Composition (Theorem 3.4 of kairouz2015composition)
  • Definition 2.4: Error Metric and Utility Metric
  • Lemma 3.1
  • Lemma 3.2: Lower Bound on Error when $m = 1$
  • Lemma 3.3: Generality of $\textsf{DaRRM}$
  • Lemma 3.4: $\gamma$ privacy condition
  • Theorem 4.1: Provable Privacy Amplification by 2
  • Lemma 5.1
  • ...and 31 more