Private Prediction via Shrinkage

Chao Yan

Private Prediction via Shrinkage

Chao Yan

TL;DR

This paper advances private prediction in streaming settings by reducing the dependence on the number of queries $T$ from the standard $\sqrt{T}$ to polylogarithmic scales. Building on the Dwork–Feldman and Naor–NNSY frameworks, it combines subsample–aggregate and sparse-vector techniques with a shrinkage strategy to bound the number of hard queries, enabling private labeling of exponentially many queries for oblivious online adversaries. For adaptive online adversaries focusing on halfspaces in $\mathbb{R}^d$, it leverages a geometric reduction to linear feasibility via $cdepth$, showing that after at most $d+1$ constraint halvings the remaining hypotheses agree on future queries, and achieving a sample complexity of $\tilde{O}(d^{5.5}\log T)$. Overall, the results establish that super-polynomial query streams can be privately answered with polylogarithmic dependence on $T$ under standard adversary models, with concrete bounds tied to VC dimension and ambient dimension.

Abstract

We study differentially private prediction introduced by Dwork and Feldman (COLT 2018): an algorithm receives one labeled sample set $S$ and then answers a stream of unlabeled queries while the output transcript remains $(\varepsilon,δ)$-differentially private with respect to $S$. Standard composition yields a $\sqrt{T}$ dependence for $T$ queries. We show that this dependence can be reduced to polylogarithmic in $T$ in streaming settings. For an oblivious online adversary and any concept class $\mathcal{C}$, we give a private predictor that answers $T$ queries with $|S|= \tilde{O}(VC(\mathcal{C})^{3.5}\log^{3.5}T)$ labeled examples. For an adaptive online adversary and halfspaces over $\mathbb{R}^d$, we obtain $|S|=\tilde{O}\left(d^{5.5}\log T\right)$.

Private Prediction via Shrinkage

TL;DR

This paper advances private prediction in streaming settings by reducing the dependence on the number of queries

from the standard

to polylogarithmic scales. Building on the Dwork–Feldman and Naor–NNSY frameworks, it combines subsample–aggregate and sparse-vector techniques with a shrinkage strategy to bound the number of hard queries, enabling private labeling of exponentially many queries for oblivious online adversaries. For adaptive online adversaries focusing on halfspaces in

, it leverages a geometric reduction to linear feasibility via

, showing that after at most

constraint halvings the remaining hypotheses agree on future queries, and achieving a sample complexity of

. Overall, the results establish that super-polynomial query streams can be privately answered with polylogarithmic dependence on

under standard adversary models, with concrete bounds tied to VC dimension and ambient dimension.

Abstract

We study differentially private prediction introduced by Dwork and Feldman (COLT 2018): an algorithm receives one labeled sample set

and then answers a stream of unlabeled queries while the output transcript remains

-differentially private with respect to

. Standard composition yields a

dependence for

queries. We show that this dependence can be reduced to polylogarithmic in

in streaming settings. For an oblivious online adversary and any concept class

, we give a private predictor that answers

queries with

labeled examples. For an adaptive online adversary and halfspaces over

, we obtain

Paper Structure (23 sections, 13 theorems, 17 equations, 3 figures, 1 table, 4 algorithms)

This paper contains 23 sections, 13 theorems, 17 equations, 3 figures, 1 table, 4 algorithms.

Introduction
Adversary models.
Our Result
The main ideas
Discussion and Open Questions
Prediction Model
Preliminaries
Learning Theory
Differential Privacy
Algorithm BetweenThresholds BunSU16
A Generic Framework
Private Prediction with Oblivious Adversary
Hypotheses Generator for Oblivious Adversary
Analysis
Prediction for Halfspaces with Adaptive Adversary
...and 8 more sections

Key Result

Theorem 1

Let $(X,R)$ have VC dimension $d$. Let $S\subseteq X$ be a subset of $X$. Let $0<\alpha,\beta\leq 1$. Let $S'\subseteq S$ be a random subset of $S$ with size at least $O\left(\frac{d\cdot\log\frac{d}{\alpha}+\log\frac{1}{\beta}}{\alpha^2}\right).$ Then with probability at least $1-\beta$, $S'$ is an

Figures (3)

Figure 1: In the left figure, we illustrate that when a “hard” query $x$ occurs, the current hypothesis set splits into two subspaces, $\mathcal{C}|_{h(x)=1}$ and $\mathcal{C}|_{h(x)=-1}$. We then guess a label uniformly at random (say, 1) and update the hypothesis set by restricting to $\mathcal{C}|_{h(x)=1}$, as shown in the right figure.
Figure 2: After answering $O(VC(\mathcal{C})\log T)$ “hard” queries, the remaining hypothesis space collapses to a set of hypotheses that induce the same labeling on the entire query sequence $x_1,\dots, x_T$ (including queries that have not yet appeared in the stream).
Figure 3: In the left figure, we illustrate that when a “hard” query occurs, the current hypotheses are roughly split across the two sides of the induced hyperplane. We then update the feasible set by restricting all hypotheses to lie on this hyperplane for subsequent rounds, while preserving the existence of a hypothesis (point) with high $cdepth$, as shown in the right figure.

Theorems & Definitions (45)

Definition 1
Definition 2: Vapnik-Chervonenkis dimension VCHaussler1986EpsilonnetsAS
Definition 3: $\alpha$-approximation VCHaussler1986EpsilonnetsAS
Theorem 1: VCHaussler1986EpsilonnetsAS
Definition 4: Generalization and empirical error
Theorem 2: BlumerEhHaWa89kaplan2020private
Definition 5: Differential Privacy DMNS06
Theorem 3: Advanced composition DRV10
Lemma 1: Privacy for BetweenThresholds BunSU16
Lemma 2: Accuracy for BetweenThresholds BunSU16
...and 35 more

Private Prediction via Shrinkage

TL;DR

Abstract

Private Prediction via Shrinkage

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (45)