Table of Contents
Fetching ...

Blessings of Multiple Good Arms in Multi-Objective Linear Bandits

Heesang Ann, Min-hwan Oh

TL;DR

This is the first study to introduce implicit exploration in both multi objective and parametric bandit settings without any distributional assumptions on the contexts, and introduces a framework for effective Pareto fairness, which provides a principled approach to rigorously analyzing fairness of multi objective bandit algorithms.

Abstract

The multi objective bandit setting has traditionally been regarded as more complex than the single objective case, as multiple objectives must be optimized simultaneously. In contrast to this prevailing view, we demonstrate that when multiple good arms exist for multiple objectives, they can induce a surprising benefit, implicit exploration. Under this condition, we show that simple algorithms that greedily select actions in most rounds can nonetheless achieve strong performance, both theoretically and empirically. To our knowledge, this is the first study to introduce implicit exploration in both multi objective and parametric bandit settings without any distributional assumptions on the contexts. We further introduce a framework for effective Pareto fairness, which provides a principled approach to rigorously analyzing fairness of multi objective bandit algorithms.

Blessings of Multiple Good Arms in Multi-Objective Linear Bandits

TL;DR

This is the first study to introduce implicit exploration in both multi objective and parametric bandit settings without any distributional assumptions on the contexts, and introduces a framework for effective Pareto fairness, which provides a principled approach to rigorously analyzing fairness of multi objective bandit algorithms.

Abstract

The multi objective bandit setting has traditionally been regarded as more complex than the single objective case, as multiple objectives must be optimized simultaneously. In contrast to this prevailing view, we demonstrate that when multiple good arms exist for multiple objectives, they can induce a surprising benefit, implicit exploration. Under this condition, we show that simple algorithms that greedily select actions in most rounds can nonetheless achieve strong performance, both theoretically and empirically. To our knowledge, this is the first study to introduce implicit exploration in both multi objective and parametric bandit settings without any distributional assumptions on the contexts. We further introduce a framework for effective Pareto fairness, which provides a principled approach to rigorously analyzing fairness of multi objective bandit algorithms.
Paper Structure (76 sections, 36 theorems, 97 equations, 12 figures, 1 table, 4 algorithms)

This paper contains 76 sections, 36 theorems, 97 equations, 12 figures, 1 table, 4 algorithms.

Key Result

Proposition 1

For any $a_*\in\mathcal{C}^*$, there exist $w \in \mathbb{S}^{M-1}$ satisfying $a_* = \arg\max_{i\in[K]} w^\top\mu_i$. Conversely, for any $w \in \mathbb{S}^{M-1}$, if $a_* = \arg\max_{i\in[K]} w^\top\mu_i$ is a unique arm, then $a_*\in\mathcal{C}^*$.

Figures (12)

  • Figure 1: Evaluation of multi-objective bandit algorithms in the fixed-feature setting. The plots in the left two columns report the performance of the algorithms, while the plots in the rightmost column report the running time. The top row shows results for $d=5$, $K=50$, $M=5$, and the bottom row shows results for $d=10$, $K=200$, $M=10$.
  • Figure 4.1: The larger circle represents the unit sphere in $\mathbb{R}^d$ while the interior of smaller circle indicates the region where $\widetilde{\theta}(s)$ may exist. Then, the blue line illustrates the distance between $\theta_{m}^*$ and the $\gamma$-good arm for $\widetilde{\theta}(s)$.
  • Figure 7.1: The interior of the circle with radius ${x_{\max} \over \gamma}$ represents the region where $x \over \gamma$ may exist in $\mathbb{R}^d$, while that of the smallest circle indicates the region where $\widetilde{\theta}(s)$ may exist. Then, the blue line illustrates the case when $x \over \gamma$ is farthest from the $\theta_{m}^*$.
  • Figure 7.2: The larger circle represents the unit sphere in $\mathbb{R}^d$ while the interior of the smallest circle indicates the region where $\widetilde{\theta}(s)$ may exist. Then, the blue line illustrates the case when $x$ is farthest from the $\theta_{m}^* \over \|\theta_{m}^* \|_2$.
  • Figure 9.1: Problem space $\Theta$ construction when $d=2$.
  • ...and 7 more figures

Theorems & Definitions (57)

  • Definition 1: Pareto order
  • Definition 2: Effective Pareto front
  • Proposition 1: Theorem 1 of park2025thompson
  • Definition 3: Effective Pareto regret
  • Definition 4: Effective Pareto fairness
  • Remark 1
  • Definition 5: Goodness of arms
  • Remark 2
  • Remark 3
  • Definition 6: Regularity indices of a distribution
  • ...and 47 more