Instance-Optimality for Private KL Distribution Estimation
Jiayuan Ye, Vitaly Feldman, Kunal Talwar
TL;DR
This work tackles the problem of estimating a discrete distribution under KL divergence with differential privacy, arguing that minimax guarantees miss per-instance difficulty. It introduces instance-optimality with additive local neighborhoods tailored to KL, and develops both non-DP and DP estimators that are instance-optimal up to constants. The core technical tools include a generalized Assouad method for decomposable distances, a sampling-twice modification of Good-Turing to reduce sensitivity, and a calibrated thresholding strategy to privatize the estimator. Empirically, the proposed DP instance-optimal estimator outperforms naive minimax baselines on power-law and real-world token distributions, while non-DP instance-optimal methods remain competitive, highlighting the practical impact of instance-adaptive private KL estimation.
Abstract
We study the fundamental problem of estimating an unknown discrete distribution $p$ over $d$ symbols, given $n$ i.i.d. samples from the distribution. We are interested in minimizing the KL divergence between the true distribution and the algorithm's estimate. We first construct minimax optimal private estimators. Minimax optimality however fails to shed light on an algorithm's performance on individual (non-worst-case) instances $p$ and simple minimax-optimal DP estimators can have poor empirical performance on real distributions. We then study this problem from an instance-optimality viewpoint, where the algorithm's error on $p$ is compared to the minimum achievable estimation error over a small local neighborhood of $p$. Under natural notions of local neighborhood, we propose algorithms that achieve instance-optimality up to constant factors, with and without a differential privacy constraint. Our upper bounds rely on (private) variants of the Good-Turing estimator. Our lower bounds use additive local neighborhoods that more precisely captures the hardness of distribution estimation in KL divergence, compared to ones considered in prior works.
