Fingerprinting Codes Meet Geometry: Improved Lower Bounds for Private Query Release and Adaptive Data Analysis
Xin Lyu, Kunal Talwar
TL;DR
This work introduces a geometry-aware framework that combines fingerprinting codes with exponential-family tools to derive lower bounds for private query release and adaptive data analysis. By embedding query sets into a geometric setting and analyzing a score via divergence theorems, the authors obtain near-tight sample complexity bounds for adaptive counting under DP, including $Ω(\sqrt{\log|X|}\,\log Q/\alpha^3)$ for counting queries and $Ω(\sqrt{\log|X|\log(1/\delta)}\,\log Q/(\varepsilon\alpha^2))$ for DP with $(\varepsilon,\delta)$, plus a complete picture for random 0/1 queries. The results extend to composition bounds, two-way marginals, and adaptive data analysis, delivering a cohesive set of lower bounds that nearly close gaps with known upper bounds. The framework relies on exponential tilts, geometry of the input set, and a divergence-to-score mechanism, offering insight into when privacy constraints fundamentally limit the ability to release information privately. Collectively, these findings guide the design of DP mechanisms and ADA algorithms by clarifying instance-specific limits and the role of domain geometry in private data analysis.
Abstract
Fingerprinting codes are a crucial tool for proving lower bounds in differential privacy. They have been used to prove tight lower bounds for several fundamental questions, especially in the ``low accuracy'' regime. Unlike reconstruction/discrepancy approaches however, they are more suited for query sets that arise naturally from the fingerprinting codes construction. In this work, we propose a general framework for proving fingerprinting type lower bounds, that allows us to tailor the technique to the geometry of the query set. Our approach allows us to prove several new results, including the following. First, we show that any (sample- and population-)accurate algorithm for answering $Q$ arbitrary adaptive counting queries over a universe $\mathcal{X}$ to accuracy $α$ needs $Ω(\frac{\sqrt{\log |\mathcal{X}|}\cdot \log Q}{α^3})$ samples, matching known upper bounds. This shows that the approaches based on differential privacy are optimal for this question, and improves significantly on the previously known lower bounds of $\frac{\log Q}{α^2}$ and $\min(\sqrt{Q}, \sqrt{\log |\mathcal{X}|})/α^2$. Second, we show that any $(\varepsilon,δ)$-DP algorithm for answering $Q$ counting queries to accuracy $α$ needs $Ω(\frac{\sqrt{ \log|\mathcal{X}| \log(1/δ)} \log Q}{\varepsilonα^2})$ samples, matching known upper bounds up to constants. Our framework allows for proving this bound via a direct correlation analysis and improves the prior bound of [BUV'14] by $\sqrt{\log(1/δ)}$. Third, we characterize the sample complexity of answering a set of random $0$-$1$ queries under approximate differential privacy. We give new upper and lower bounds in different regimes. By combining them with known results, we can complete the whole picture.
