Robust Pareto Set Identification with Contaminated Bandit Feedback

İlter Onat Korkmaz; Efe Eren Ceyani; Kerem Bozgan; Cem Tekin

Robust Pareto Set Identification with Contaminated Bandit Feedback

İlter Onat Korkmaz, Efe Eren Ceyani, Kerem Bozgan, Cem Tekin

TL;DR

This work considers the Pareto set identification (PSI) problem in multi-objective multi-armed bandits (MO-MAB) and proposes a sample median-based multi-objective adaptive elimination algorithm that returns an PAC Pareto set upon termination with a sample complexity bound that depends on the contamination probability.

Abstract

We consider the Pareto set identification (PSI) problem in multi-objective multi-armed bandits (MO-MAB) with contaminated reward observations. At each arm pull, with some fixed probability, the true reward samples are replaced with the samples from an arbitrary contamination distribution chosen by an adversary. We consider (α, δ)-PAC PSI and propose a sample median-based multi-objective adaptive elimination algorithm that returns an (α, δ)- PAC Pareto set upon termination with a sample complexity bound that depends on the contamination probability. As the contamination probability decreases, we recover the wellknown sample complexity results in MO-MAB. We compare the proposed algorithm with a mean-based method from MO-MAB literature, as well as an extended version that uses median estimators, on several PSI problems under adversarial corruptions, including review bombing and diabetes management. Our numerical results support our theoretical findings and demonstrate that robust algorithm design is crucial for accurate PSI under contaminated reward observations.

Robust Pareto Set Identification with Contaminated Bandit Feedback

TL;DR

Abstract

Paper Structure (26 sections, 13 theorems, 93 equations, 1 figure, 2 tables, 1 algorithm)

This paper contains 26 sections, 13 theorems, 93 equations, 1 figure, 2 tables, 1 algorithm.

Introduction
Contribution and Comparison with Related Works
Organization
Notation
Problem Formulation
Adversarial Contamination Models
Unavoidable Bias and Median Concentration
Multi-objective Suboptimality Gap
Pareto Accuracy
A Robust Learning Algorithm
Accuracy and Sample Complexity Analysis
Good Event
Main Results
Numerical Results
Experiments on MovieLens Dataset
...and 11 more sections

Key Result

Lemma 1

(Upper bound on empirical median deviation for prescient and oblivious adversaries) altschuler. Let $\bar{t} \in (0, \frac{1}{2})$, $\epsilon \in (0, \frac{2\bar{t}}{1 + 2\bar{t}})$, $\delta \in (0,1)$ and $F \in {C_{R,\bar{t}}}$, where $R$ is a non-decreasing function defined on domain $[0, \bar{t}

Figures (1)

Figure 1: Visualization of R-PSI algorithm in two dimensional objective space. The red circles mark points used in comparisons. The blue squares are the confidence regions of arms given in Lemma \ref{['lem:good_event']}, whereas shaded squares are the smaller regions used in identification phase. The left figure visualizes the elimination phase, where arm $3$ eliminates arm $4$. The middle and right figures visualize the Pareto identification phase of the algorithm. Both arm $2$ and arm $3$ are added to $O_1$, as shown in the middle figure. However, arm $2$ is suspected to be useful in elimination of the arm $1$ in the future rounds, thus it is not added to $O_2$ as depicted in the right figure. As a result, only arm $3$ is added to the estimated Pareto set $P$.

Theorems & Definitions (27)

Definition 1
Definition 2
Lemma 1
Lemma 2
Example 1
Definition 3
Definition 4
Lemma 3
Definition 5
Remark 1
...and 17 more

Robust Pareto Set Identification with Contaminated Bandit Feedback

TL;DR

Abstract

Robust Pareto Set Identification with Contaminated Bandit Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (27)