Table of Contents
Fetching ...

Subset-Based Instance Optimality in Private Estimation

Travis Dick, Alex Kulesza, Ziteng Sun, Ananda Theertha Suresh

TL;DR

This work introduces subset-based instance optimality for differentially private estimation, a strong framework that compares an algorithm against the best private benchmark evaluated on large subsets of the dataset rather than on all datasets. It develops a practical private mean-estimation method built from private thresholding and the inverse sensitivity mechanism, achieving a guarantee that matches or surpasses prior rates under broad distributional assumptions. The authors extend the approach to a wider class of monotone properties, including means, medians, quantiles, and Lp-minimizers, and demonstrate distribution-specific performance in the SME setting for subgaussian and bounded-moment families, with extendability to higher dimensions. The results yield near-minimax optimality without requiring strong distributional priors, enhancing private estimation's robustness and applicability in real data contexts.

Abstract

We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well when potentially extreme points are added to $D$; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and $\ell_p$-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions.

Subset-Based Instance Optimality in Private Estimation

TL;DR

This work introduces subset-based instance optimality for differentially private estimation, a strong framework that compares an algorithm against the best private benchmark evaluated on large subsets of the dataset rather than on all datasets. It develops a practical private mean-estimation method built from private thresholding and the inverse sensitivity mechanism, achieving a guarantee that matches or surpasses prior rates under broad distributional assumptions. The authors extend the approach to a wider class of monotone properties, including means, medians, quantiles, and Lp-minimizers, and demonstrate distribution-specific performance in the SME setting for subgaussian and bounded-moment families, with extendability to higher dimensions. The results yield near-minimax optimality without requiring strong distributional priors, enhancing private estimation's robustness and applicability in real data contexts.

Abstract

We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset , with the best private benchmark algorithm that (a) knows in advance and (b) is evaluated by its worst-case performance on large subsets of . That is, the benchmark algorithm need not perform well when potentially extreme points are added to ; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and -norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions.
Paper Structure (30 sections, 17 theorems, 105 equations, 1 figure, 1 table, 2 algorithms)

This paper contains 30 sections, 17 theorems, 105 equations, 1 figure, 1 table, 2 algorithms.

Key Result

Theorem 1

Let $D \subset [-R,R]$ be a multiset of points and let $\hat{\mu}$ be the output of alg:SubsetOptimalMean with parameters $R$ and $\varepsilon > 0$. Publishing $\hat{\mu}$ is $3\varepsilon$-differentially private and for any $\gamma > 0$, we have

Figures (1)

  • Figure 1: Example of ranks as defined in \ref{['defn:rank']}. There are 4 points with $x_2 = x_3$. For each rank $r \in \{0, \dots, 4\}$ we show the interval of points that are rank-$r$ thresholds. For $r = 0, 1, \dots, 4$, the intervals of rank-$r$ thresholds are given by $I_0 = [-\infty, x_1]$, $I_1 = [x_1, x_2]$, $I_2 = \{x_2\}$, $I_3 = [x_3, x_4]$, and $I_4 = [x_4, \infty]$, respectively.

Theorems & Definitions (36)

  • Definition 1: Differential privacy
  • Definition 2
  • Remark 1
  • Remark 2
  • Theorem 1
  • Lemma 1
  • Lemma 2
  • Definition 3
  • Definition 4
  • Definition 5
  • ...and 26 more