Table of Contents
Fetching ...

Locally Private Nonparametric Contextual Multi-armed Bandits

Yuheng Ma, Feiyu Jiang, Zifeng Zhao, Hanfang Yang, Yi Yu

TL;DR

We address nonparametric contextual MAB under local differential privacy, establishing minimax regret lower and upper bounds and introducing a uniform-confidence-bound estimator. The core method partitions the covariate space with dynamic dyadic bins, privately estimates arm rewards via Laplace noise, and performs arm elimination and bin refinement to achieve near-minimax rates. The paper extends to private transfer learning with auxiliary data under covariate shift, defining transfer-exponent and exploration-coefficient parameters and proving improved regret bounds through a jump-start stage. Empirical results on synthetic and real data validate the theoretical guarantees and demonstrate the practical value of leveraging auxiliary data under privacy constraints.

Abstract

Motivated by privacy concerns in sequential decision-making on sensitive data, we address the challenge of nonparametric contextual multi-armed bandits (MAB) under local differential privacy (LDP). We develop a uniform-confidence-bound-type estimator, showing its minimax optimality supported by a matching minimax lower bound. We further consider the case where auxiliary datasets are available, subject also to (possibly heterogeneous) LDP constraints. Under the widely-used covariate shift framework, we propose a jump-start scheme to effectively utilize the auxiliary data, the minimax optimality of which is further established by a matching lower bound. Comprehensive experiments on both synthetic and real-world datasets validate our theoretical results and underscore the effectiveness of the proposed methods.

Locally Private Nonparametric Contextual Multi-armed Bandits

TL;DR

We address nonparametric contextual MAB under local differential privacy, establishing minimax regret lower and upper bounds and introducing a uniform-confidence-bound estimator. The core method partitions the covariate space with dynamic dyadic bins, privately estimates arm rewards via Laplace noise, and performs arm elimination and bin refinement to achieve near-minimax rates. The paper extends to private transfer learning with auxiliary data under covariate shift, defining transfer-exponent and exploration-coefficient parameters and proving improved regret bounds through a jump-start stage. Empirical results on synthetic and real data validate the theoretical guarantees and demonstrate the practical value of leveraging auxiliary data under privacy constraints.

Abstract

Motivated by privacy concerns in sequential decision-making on sensitive data, we address the challenge of nonparametric contextual multi-armed bandits (MAB) under local differential privacy (LDP). We develop a uniform-confidence-bound-type estimator, showing its minimax optimality supported by a matching minimax lower bound. We further consider the case where auxiliary datasets are available, subject also to (possibly heterogeneous) LDP constraints. Under the widely-used covariate shift framework, we propose a jump-start scheme to effectively utilize the auxiliary data, the minimax optimality of which is further established by a matching lower bound. Comprehensive experiments on both synthetic and real-world datasets validate our theoretical results and underscore the effectiveness of the proposed methods.

Paper Structure

This paper contains 19 sections, 5 theorems, 41 equations, 9 figures, 2 tables, 3 algorithms.

Key Result

Theorem 2.5

Consider the class of distributions $\Lambda(K, \beta)$ in equ:defofclasslamda and the class of LDP policies $\Pi(\varepsilon)$. It holds that where $c>0$ is an absolute constant depending only on $d$, $C_L$ and $\beta$. In particular, when $0< \varepsilon \leq 1$, it holds with an absolute constant $c' > 0$ that

Figures (9)

  • Figure 1: Illustration of the learning process. To achieve LDP, the server only receives privatized information $\tilde{Z}^{\mathrm{P}}_t$, while the context $X_t^{\mathrm{P}}$, the pulled arm $\pi_t(X_t^{\mathrm{P}})$, and the reward $Y_t^{\mathrm{P}}$ remains at the user end. The same applies to the auxiliary data.
  • Figure 2: Illustration of key steps of the proposed algorithm.
  • Figure 3: A partition created by the max-edge rule for $d = 2$. Blue areas give the corresponding bins.
  • Figure 4: Illustration of reward functions.
  • Figure 5: Illustration of marginal distribution $Q_{m,X}$ of source data.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Definition 2.1: Local Differential Privacy
  • Theorem 2.5: Lower bound
  • Theorem 2.6: Upper bound
  • Proposition 2.7
  • Definition 3.1: Transfer exponent
  • Definition 3.2: Exploration coefficient
  • Theorem 3.3: Lower bound
  • Theorem 3.4: Upper bound