Table of Contents
Fetching ...

Robust Pareto Set Identification with Contaminated Bandit Feedback

İlter Onat Korkmaz, Efe Eren Ceyani, Kerem Bozgan, Cem Tekin

TL;DR

This work considers the Pareto set identification (PSI) problem in multi-objective multi-armed bandits (MO-MAB) and proposes a sample median-based multi-objective adaptive elimination algorithm that returns an PAC Pareto set upon termination with a sample complexity bound that depends on the contamination probability.

Abstract

We consider the Pareto set identification (PSI) problem in multi-objective multi-armed bandits (MO-MAB) with contaminated reward observations. At each arm pull, with some fixed probability, the true reward samples are replaced with the samples from an arbitrary contamination distribution chosen by an adversary. We consider (α, δ)-PAC PSI and propose a sample median-based multi-objective adaptive elimination algorithm that returns an (α, δ)- PAC Pareto set upon termination with a sample complexity bound that depends on the contamination probability. As the contamination probability decreases, we recover the wellknown sample complexity results in MO-MAB. We compare the proposed algorithm with a mean-based method from MO-MAB literature, as well as an extended version that uses median estimators, on several PSI problems under adversarial corruptions, including review bombing and diabetes management. Our numerical results support our theoretical findings and demonstrate that robust algorithm design is crucial for accurate PSI under contaminated reward observations.

Robust Pareto Set Identification with Contaminated Bandit Feedback

TL;DR

This work considers the Pareto set identification (PSI) problem in multi-objective multi-armed bandits (MO-MAB) and proposes a sample median-based multi-objective adaptive elimination algorithm that returns an PAC Pareto set upon termination with a sample complexity bound that depends on the contamination probability.

Abstract

We consider the Pareto set identification (PSI) problem in multi-objective multi-armed bandits (MO-MAB) with contaminated reward observations. At each arm pull, with some fixed probability, the true reward samples are replaced with the samples from an arbitrary contamination distribution chosen by an adversary. We consider (α, δ)-PAC PSI and propose a sample median-based multi-objective adaptive elimination algorithm that returns an (α, δ)- PAC Pareto set upon termination with a sample complexity bound that depends on the contamination probability. As the contamination probability decreases, we recover the wellknown sample complexity results in MO-MAB. We compare the proposed algorithm with a mean-based method from MO-MAB literature, as well as an extended version that uses median estimators, on several PSI problems under adversarial corruptions, including review bombing and diabetes management. Our numerical results support our theoretical findings and demonstrate that robust algorithm design is crucial for accurate PSI under contaminated reward observations.
Paper Structure (26 sections, 13 theorems, 93 equations, 1 figure, 2 tables, 1 algorithm)

This paper contains 26 sections, 13 theorems, 93 equations, 1 figure, 2 tables, 1 algorithm.

Key Result

Lemma 1

(Upper bound on empirical median deviation for prescient and oblivious adversaries) altschuler. Let $\bar{t} \in (0, \frac{1}{2})$, $\epsilon \in (0, \frac{2\bar{t}}{1 + 2\bar{t}})$, $\delta \in (0,1)$ and $F \in {C_{R,\bar{t}}}$, where $R$ is a non-decreasing function defined on domain $[0, \bar{t}

Figures (1)

  • Figure 1: Visualization of R-PSI algorithm in two dimensional objective space. The red circles mark points used in comparisons. The blue squares are the confidence regions of arms given in Lemma \ref{['lem:good_event']}, whereas shaded squares are the smaller regions used in identification phase. The left figure visualizes the elimination phase, where arm $3$ eliminates arm $4$. The middle and right figures visualize the Pareto identification phase of the algorithm. Both arm $2$ and arm $3$ are added to $O_1$, as shown in the middle figure. However, arm $2$ is suspected to be useful in elimination of the arm $1$ in the future rounds, thus it is not added to $O_2$ as depicted in the right figure. As a result, only arm $3$ is added to the estimated Pareto set $P$.

Theorems & Definitions (27)

  • Definition 1
  • Definition 2
  • Lemma 1
  • Lemma 2
  • Example 1
  • Definition 3
  • Definition 4
  • Lemma 3
  • Definition 5
  • Remark 1
  • ...and 17 more