Locally Private Nonparametric Contextual Multi-armed Bandits
Yuheng Ma, Feiyu Jiang, Zifeng Zhao, Hanfang Yang, Yi Yu
TL;DR
We address nonparametric contextual MAB under local differential privacy, establishing minimax regret lower and upper bounds and introducing a uniform-confidence-bound estimator. The core method partitions the covariate space with dynamic dyadic bins, privately estimates arm rewards via Laplace noise, and performs arm elimination and bin refinement to achieve near-minimax rates. The paper extends to private transfer learning with auxiliary data under covariate shift, defining transfer-exponent and exploration-coefficient parameters and proving improved regret bounds through a jump-start stage. Empirical results on synthetic and real data validate the theoretical guarantees and demonstrate the practical value of leveraging auxiliary data under privacy constraints.
Abstract
Motivated by privacy concerns in sequential decision-making on sensitive data, we address the challenge of nonparametric contextual multi-armed bandits (MAB) under local differential privacy (LDP). We develop a uniform-confidence-bound-type estimator, showing its minimax optimality supported by a matching minimax lower bound. We further consider the case where auxiliary datasets are available, subject also to (possibly heterogeneous) LDP constraints. Under the widely-used covariate shift framework, we propose a jump-start scheme to effectively utilize the auxiliary data, the minimax optimality of which is further established by a matching lower bound. Comprehensive experiments on both synthetic and real-world datasets validate our theoretical results and underscore the effectiveness of the proposed methods.
