On the Query Complexity of Training Data Reconstruction in Private Learning
Prateeti Mukherjee, Satya Lokam
TL;DR
The paper addresses training data reconstruction attacks by quantifying how many queries a whitebox adversary must make to reconstruct training samples when the learner is differentially private, Rényi-DP, or metric-DP. It develops non-asymptotic, minimax-optimal lower bounds on the adversary's query complexity across DP settings for arbitrary compact metric spaces and extends to Rényi DP and Metric DP on locally compact spaces, revealing explicit dependence on privacy parameters, target reconstruction tolerance, and geometric complexity. The results show tight bounds, nearly matching upper bounds achieved by classical privacy mechanisms, and they generalize beyond bounded Euclidean domains to unbounded metric spaces with dimension-like terms governing the difficulty of reconstruction. Practically, this work provides a principled auditing framework for private learners and clarifies how privacy parameters constrain reconstruction risks in a broad range of data domains and learning paradigms.
Abstract
We analyze the number of queries that a whitebox adversary needs to make to a private learner in order to reconstruct its training data. For $(ε, δ)$ DP learners with training data drawn from any arbitrary compact metric space, we provide the \emph{first known lower bounds on the adversary's query complexity} as a function of the learner's privacy parameters. \emph{Our results are minimax optimal for every $ε\geq 0, δ\in [0, 1]$, covering both $ε$-DP and $(0, δ)$ DP as corollaries}. Beyond this, we obtain query complexity lower bounds for $(α, ε)$ Rényi DP learners that are valid for any $α> 1, ε\geq 0$. Finally, we analyze data reconstruction attacks on locally compact metric spaces via the framework of Metric DP, a generalization of DP that accounts for the underlying metric structure of the data. In this setting, we provide the first known analysis of data reconstruction in unbounded, high dimensional spaces and obtain query complexity lower bounds that are nearly tight modulo logarithmic factors.
