DP-Dueling: Learning from Preference Feedback without Compromising User Privacy

Aadirupa Saha; Hilal Asi

DP-Dueling: Learning from Preference Feedback without Compromising User Privacy

Aadirupa Saha, Hilal Asi

TL;DR

A general class of utility-based preference matrices for large (potentially unbounded) decision spaces for large (potentially unbounded) decision spaces is considered and the first differentially private dueling bandit algorithm for active learning with user preferences is given.

Abstract

We consider the well-studied dueling bandit problem, where a learner aims to identify near-optimal actions using pairwise comparisons, under the constraint of differential privacy. We consider a general class of utility-based preference matrices for large (potentially unbounded) decision spaces and give the first differentially private dueling bandit algorithm for active learning with user preferences. Our proposed algorithms are computationally efficient with near-optimal performance, both in terms of the private and non-private regret bound. More precisely, we show that when the decision space is of finite size $K$, our proposed algorithm yields order optimal $O\Big(\sum_{i = 2}^K\log\frac{KT}{Δ_i} + \frac{K}ε\Big)$ regret bound for pure $ε$-DP, where $Δ_i$ denotes the suboptimality gap of the $i$-th arm. We also present a matching lower bound analysis which proves the optimality of our algorithms. Finally, we extend our results to any general decision space in $d$-dimensions with potentially infinite arms and design an $ε$-DP algorithm with regret $\tilde{O} \left( \frac{d^6}{κε} + \frac{ d\sqrt{T }}κ \right)$, providing privacy for free when $T \gg d$.

DP-Dueling: Learning from Preference Feedback without Compromising User Privacy

TL;DR

Abstract

, our proposed algorithm yields order optimal

regret bound for pure

-DP, where

denotes the suboptimality gap of the

-th arm. We also present a matching lower bound analysis which proves the optimality of our algorithms. Finally, we extend our results to any general decision space in

-dimensions with potentially infinite arms and design an

-DP algorithm with regret

, providing privacy for free when

Paper Structure (35 sections, 25 theorems, 80 equations, 2 algorithms)

This paper contains 35 sections, 25 theorems, 80 equations, 2 algorithms.

Introduction
Informal Problem Setting.
Objective.
Our Contributions
Problem Setting and Notation
Problem Setting.
Differentially Private Dueling Bandit ( DP-DB).
Performance Measure: Regret Minimization Under $(\epsilon,\delta)$-DP
Preliminaries: Some Useful Concepts
Kiefer–Wolfowitz Theorem.
The Binary Tree Mechanism.
Warm Up: Finite armed DP-DB
Algorithm: DP-EBS-Elimination
(1) Round-Robin Duel Selection on the Active Set.
(2) Maintaining EBS Estimates and UCBs.
...and 20 more sections

Key Result

Lemma 1

Let $\epsilon \le 1$. There is an $\epsilon$-DP algorithm ($\mathsf{BinTree}$) that takes a stream of numbers $a_1,a_2,\dots,a_T \in [0,1]$ and outputs $c_1,c_2,\dots,c_T$ such that for all $t \in [T]$ and any $\delta \in (0,1)$, with probability at least $1-\delta$,

Theorems & Definitions (46)

Definition 1: ($\epsilon, \delta)$-differentially private dueling bandit
Definition 2: $\varepsilon$-net for any set ${\mathcal{S}}$ matouvsek1989constructionvershynin2018high
Remark 1
Definition 3: G-Optimal Design
Lemma 1: DworkNaPiRo10, Theorem 4.1
Theorem 1: Regret Analysis of
proof : Proof sketch of \ref{['thm:fin']}
Lemma 2
proof
Lemma 3
...and 36 more

DP-Dueling: Learning from Preference Feedback without Compromising User Privacy

TL;DR

Abstract

DP-Dueling: Learning from Preference Feedback without Compromising User Privacy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (46)