Privately Estimating Black-Box Statistics
Günter F. Steinke, Thomas Steinke
TL;DR
This work tackles privately estimating black-box statistics when global sensitivity is large or unknown, introducing a differential privacy framework that treats $f$ as a black-box applied to a private dataset. The authors design a tunable algorithm that evaluates $f$ on $k$ overlapping subsets of size $n-m$ determined by a covering design and privately aggregates the results with a shifted inverse mechanism to yield $M^f(x)$ with $(\varepsilon,\delta)$-DP, achieving a trade-off between statistical accuracy and oracle complexity. A key contribution is the formal characterization of the tradeoff curve, showing special cases that recover the sample-and-aggregate approach and LRSS25, as well as a continuum that yields $k=O(t^c)$ evaluations for practical parameter choices. They prove a near-optimal lower bound on the required number of oracle calls ${n \choose t}/{m \choose t}$, demonstrating the method’s efficiency is close to the theoretical limit and making the approach viable for black-box statistics without explicit sensitivity bounds.
Abstract
Standard techniques for differentially private estimation, such as Laplace or Gaussian noise addition, require guaranteed bounds on the sensitivity of the estimator in question. But such sensitivity bounds are often large or simply unknown. Thus we seek differentially private methods that can be applied to arbitrary black-box functions. A handful of such techniques exist, but all are either inefficient in their use of data or require evaluating the function on exponentially many inputs. In this work we present a scheme that trades off between statistical efficiency (i.e., how much data is needed) and oracle efficiency (i.e., the number of evaluations). We also present lower bounds showing the near-optimality of our scheme.
