Subset-Based Instance Optimality in Private Estimation
Travis Dick, Alex Kulesza, Ziteng Sun, Ananda Theertha Suresh
TL;DR
This work introduces subset-based instance optimality for differentially private estimation, a strong framework that compares an algorithm against the best private benchmark evaluated on large subsets of the dataset rather than on all datasets. It develops a practical private mean-estimation method built from private thresholding and the inverse sensitivity mechanism, achieving a guarantee that matches or surpasses prior rates under broad distributional assumptions. The authors extend the approach to a wider class of monotone properties, including means, medians, quantiles, and Lp-minimizers, and demonstrate distribution-specific performance in the SME setting for subgaussian and bounded-moment families, with extendability to higher dimensions. The results yield near-minimax optimality without requiring strong distributional priors, enhancing private estimation's robustness and applicability in real data contexts.
Abstract
We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well when potentially extreme points are added to $D$; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and $\ell_p$-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions.
