Subset-Based Instance Optimality in Private Estimation

Travis Dick; Alex Kulesza; Ziteng Sun; Ananda Theertha Suresh

Subset-Based Instance Optimality in Private Estimation

Travis Dick, Alex Kulesza, Ziteng Sun, Ananda Theertha Suresh

TL;DR

This work introduces subset-based instance optimality for differentially private estimation, a strong framework that compares an algorithm against the best private benchmark evaluated on large subsets of the dataset rather than on all datasets. It develops a practical private mean-estimation method built from private thresholding and the inverse sensitivity mechanism, achieving a guarantee that matches or surpasses prior rates under broad distributional assumptions. The authors extend the approach to a wider class of monotone properties, including means, medians, quantiles, and Lp-minimizers, and demonstrate distribution-specific performance in the SME setting for subgaussian and bounded-moment families, with extendability to higher dimensions. The results yield near-minimax optimality without requiring strong distributional priors, enhancing private estimation's robustness and applicability in real data contexts.

Abstract

We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well when potentially extreme points are added to $D$; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and $\ell_p$-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions.

Subset-Based Instance Optimality in Private Estimation

TL;DR

Abstract

We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset

, with the best private benchmark algorithm that (a) knows

in advance and (b) is evaluated by its worst-case performance on large subsets of

. That is, the benchmark algorithm need not perform well when potentially extreme points are added to

; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and

-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions.

Paper Structure (30 sections, 17 theorems, 105 equations, 1 figure, 1 table, 2 algorithms)

This paper contains 30 sections, 17 theorems, 105 equations, 1 figure, 1 table, 2 algorithms.

Introduction
Problem formulation
Our Contributions.
Subset-based instance optimality.
Improvement on mean estimation.
Related Work
Instance-optimality in private estimation.
Private statistical mean estimation.
Subset-Optimal Private Means
Private Thresholds
Mean Estimation
Intuition
Instance-optimal algorithm for monotone properties
Implications on private statistical mean estimation.
Subgaussian distributions.
...and 15 more sections

Key Result

Theorem 1

Let $D \subset [-R,R]$ be a multiset of points and let $\hat{\mu}$ be the output of alg:SubsetOptimalMean with parameters $R$ and $\varepsilon > 0$. Publishing $\hat{\mu}$ is $3\varepsilon$-differentially private and for any $\gamma > 0$, we have

Figures (1)

Figure 1: Example of ranks as defined in \ref{['defn:rank']}. There are 4 points with $x_2 = x_3$. For each rank $r \in \{0, \dots, 4\}$ we show the interval of points that are rank-$r$ thresholds. For $r = 0, 1, \dots, 4$, the intervals of rank-$r$ thresholds are given by $I_0 = [-\infty, x_1]$, $I_1 = [x_1, x_2]$, $I_2 = \{x_2\}$, $I_3 = [x_3, x_4]$, and $I_4 = [x_4, \infty]$, respectively.

Theorems & Definitions (36)

Definition 1: Differential privacy
Definition 2
Remark 1
Remark 2
Theorem 1
Lemma 1
Lemma 2
Definition 3
Definition 4
Definition 5
...and 26 more

Subset-Based Instance Optimality in Private Estimation

TL;DR

Abstract

Subset-Based Instance Optimality in Private Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (36)