Private Prediction via Shrinkage
Chao Yan
TL;DR
This paper advances private prediction in streaming settings by reducing the dependence on the number of queries $T$ from the standard $\sqrt{T}$ to polylogarithmic scales. Building on the Dwork–Feldman and Naor–NNSY frameworks, it combines subsample–aggregate and sparse-vector techniques with a shrinkage strategy to bound the number of hard queries, enabling private labeling of exponentially many queries for oblivious online adversaries. For adaptive online adversaries focusing on halfspaces in $\mathbb{R}^d$, it leverages a geometric reduction to linear feasibility via $cdepth$, showing that after at most $d+1$ constraint halvings the remaining hypotheses agree on future queries, and achieving a sample complexity of $\tilde{O}(d^{5.5}\log T)$. Overall, the results establish that super-polynomial query streams can be privately answered with polylogarithmic dependence on $T$ under standard adversary models, with concrete bounds tied to VC dimension and ambient dimension.
Abstract
We study differentially private prediction introduced by Dwork and Feldman (COLT 2018): an algorithm receives one labeled sample set $S$ and then answers a stream of unlabeled queries while the output transcript remains $(\varepsilon,δ)$-differentially private with respect to $S$. Standard composition yields a $\sqrt{T}$ dependence for $T$ queries. We show that this dependence can be reduced to polylogarithmic in $T$ in streaming settings. For an oblivious online adversary and any concept class $\mathcal{C}$, we give a private predictor that answers $T$ queries with $|S|= \tilde{O}(VC(\mathcal{C})^{3.5}\log^{3.5}T)$ labeled examples. For an adaptive online adversary and halfspaces over $\mathbb{R}^d$, we obtain $|S|=\tilde{O}\left(d^{5.5}\log T\right)$.
