Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment

Bingshan Hu; Zhiming Huang; Nishant A. Mehta; Nidhi Hegde

Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment

Bingshan Hu, Zhiming Huang, Nishant A. Mehta, Nidhi Hegde

TL;DR

This work investigates differential privacy in stochastic online learning under bandit and full-information feedback, establishing near-optimal regret guarantees. It introduces Anytime-Lazy-UCB and Lazy-DP-TS for private bandits, achieving the instance-dependent rate $O\left(\sum_{j:Δ_j>0}\frac{\ln T}{\min{\{Δ_j,ε\}}}\right)$, and RNM-FTNL for private full information with instance-dependent and minimax bounds, up to a log factor. The paper also proves lower bounds $Ω\left(\frac{\log K}{\min{\{Δ_{\min},ε\}}}\right)$ and $Ω\left(\sqrt{T \log K} + \frac{\log K}{ε}\right)$, clarifying the privacy cost and its interaction with problem structure. Experimental results validate the practical performance of the proposed methods and highlight the dominance of the private TS and RNM-based approaches in various regimes. Overall, the work advances private online learning by delivering anytime, near-optimal algorithms for both bandit and full-information settings and identifying key gaps for future research.

Abstract

In this paper, we study differentially private online learning problems in a stochastic environment under both bandit and full information feedback. For differentially private stochastic bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal $O \left(\sum_{j: Δ_j>0} \frac{\ln(T)}{\min \left\{Δ_j, ε\right\}} \right)$ instance-dependent regret bound, where $T$ is the finite learning horizon, $Δ_j$ denotes the suboptimality gap between the optimal arm and a suboptimal arm $j$, and $ε$ is the required privacy parameter. For the differentially private full information setting with stochastic rewards, we show an $Ω\left(\frac{\ln(K)}{\min \left\{Δ_{\min}, ε\right\}} \right)$ instance-dependent regret lower bound and an $Ω\left(\sqrt{T\ln(K)} + \frac{\ln(K)}ε\right)$ minimax lower bound, where $K$ is the total number of actions and $Δ_{\min}$ denotes the minimum suboptimality gap among all the suboptimal actions. For the same differentially private full information setting, we also present an $ε$-differentially private algorithm whose instance-dependent regret and worst-case regret match our respective lower bounds up to an extra $\log(T)$ factor.

Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment

TL;DR

, and RNM-FTNL for private full information with instance-dependent and minimax bounds, up to a log factor. The paper also proves lower bounds

and

, clarifying the privacy cost and its interaction with problem structure. Experimental results validate the practical performance of the proposed methods and highlight the dominance of the private TS and RNM-based approaches in various regimes. Overall, the work advances private online learning by delivering anytime, near-optimal algorithms for both bandit and full-information settings and identifying key gaps for future research.

Abstract

instance-dependent regret bound, where

is the finite learning horizon,

denotes the suboptimality gap between the optimal arm and a suboptimal arm

, and

is the required privacy parameter. For the differentially private full information setting with stochastic rewards, we show an

instance-dependent regret lower bound and an

minimax lower bound, where

is the total number of actions and

denotes the minimum suboptimality gap among all the suboptimal actions. For the same differentially private full information setting, we also present an

-differentially private algorithm whose instance-dependent regret and worst-case regret match our respective lower bounds up to an extra

factor.

Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment

TL;DR

Abstract

Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (18)