Online Distribution Learning with Local Private Constraints

Jin Sima; Changlong Wu; Olgica Milenkovic; Wojciech Szpankowski

Online Distribution Learning with Local Private Constraints

Jin Sima, Changlong Wu, Olgica Milenkovic, Wojciech Szpankowski

TL;DR

The paper analyzes online conditional distribution learning under local differential privacy with unbounded label sets, formalizing a minimax KL-risk objective. It proves a fundamental lower bound of $\Omega\left(\frac{1}{\epsilon}\sqrt{KT}\right)$ and presents near-matching upper bounds using an EXP3-inspired privatization scheme with log-likelihood perturbations, clipping, and a single-coordinate noise reduction, plus a pure-DP variant achieving similar rates. The approach bridges online learning with private probability estimation and shows how KL-risk translates to averaged TV-risk via Pinsker’s inequality, recovering batch results in the non-interactive setting. Together, these results illuminate the limits and design of privacy-preserving online distribution learning when the label alphabet is unbounded and highlight techniques that tame log-likelihood sensitivity under local privacy constraints.

Abstract

We study the problem of online conditional distribution estimation with \emph{unbounded} label sets under local differential privacy. Let $\mathcal{F}$ be a distribution-valued function class with unbounded label set. We aim at estimating an \emph{unknown} function $f\in \mathcal{F}$ in an online fashion so that at time $t$ when the context $\boldsymbol{x}_t$ is provided we can generate an estimate of $f(\boldsymbol{x}_t)$ under KL-divergence knowing only a privatized version of the true labels sampling from $f(\boldsymbol{x}_t)$. The ultimate objective is to minimize the cumulative KL-risk of a finite horizon $T$. We show that under $(ε,0)$-local differential privacy of the privatized labels, the KL-risk grows as $\tildeΘ(\frac{1}ε\sqrt{KT})$ upto poly-logarithmic factors where $K=|\mathcal{F}|$. This is in stark contrast to the $\tildeΘ(\sqrt{T\log K})$ bound demonstrated by Wu et al. (2023a) for bounded label sets. As a byproduct, our results recover a nearly tight upper bound for the hypothesis selection problem of gopi et al. (2020) established only for the batch setting.

Online Distribution Learning with Local Private Constraints

TL;DR

The paper analyzes online conditional distribution learning under local differential privacy with unbounded label sets, formalizing a minimax KL-risk objective. It proves a fundamental lower bound of

and presents near-matching upper bounds using an EXP3-inspired privatization scheme with log-likelihood perturbations, clipping, and a single-coordinate noise reduction, plus a pure-DP variant achieving similar rates. The approach bridges online learning with private probability estimation and shows how KL-risk translates to averaged TV-risk via Pinsker’s inequality, recovering batch results in the non-interactive setting. Together, these results illuminate the limits and design of privacy-preserving online distribution learning when the label alphabet is unbounded and highlight techniques that tame log-likelihood sensitivity under local privacy constraints.

Abstract

We study the problem of online conditional distribution estimation with \emph{unbounded} label sets under local differential privacy. Let

be a distribution-valued function class with unbounded label set. We aim at estimating an \emph{unknown} function

in an online fashion so that at time

when the context

is provided we can generate an estimate of

under KL-divergence knowing only a privatized version of the true labels sampling from

. The ultimate objective is to minimize the cumulative KL-risk of a finite horizon

. We show that under

-local differential privacy of the privatized labels, the KL-risk grows as

upto poly-logarithmic factors where

. This is in stark contrast to the

bound demonstrated by Wu et al. (2023a) for bounded label sets. As a byproduct, our results recover a nearly tight upper bound for the hypothesis selection problem of gopi et al. (2020) established only for the batch setting.

Paper Structure (18 sections, 14 theorems, 62 equations, 2 algorithms)

This paper contains 18 sections, 14 theorems, 62 equations, 2 algorithms.

Introduction
Results and Techniques
Related Work
Summary of contributions.
Problem Setup and Preliminaries
Problem Formulation
An $\Omega(\sqrt{KT})$ Lower Bound
Approximate-DP via WMA
The weighted majority algorithm
Our scheme
Clipping of distributions.
The privatization scheme.
Distribution learning algorithm.
Pure-DP via Modified EXP3
Bounding Averaged TV-risk
...and 3 more sections

Key Result

Theorem 1

There exists a finite class $\mathcal{F}$ of size $K$ with $|\mathcal{Y}|\le K$ such that for any $(\epsilon,0)$-local differential private mechanism and learning rules, the KL-risk is lower bounded by $\Omega(\frac{1}{\epsilon}\sqrt{KT})$.

Theorems & Definitions (23)

Theorem 1: Lower Bound
Theorem 2: Upper Bound
Definition 1
Theorem 3
proof : Sketch of Proof
Lemma 1: shalev2014understanding
Lemma 2: steinke2022composition
Lemma 3
proof
Lemma 4
...and 13 more

Online Distribution Learning with Local Private Constraints

TL;DR

Abstract

Online Distribution Learning with Local Private Constraints

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (23)