Table of Contents
Fetching ...

Best Group Identification in Multi-Objective Bandits

Mohammad Shahverdikondori, Mohammad Reza Badri, Negar Kiyavash

TL;DR

The paper tackles best group identification (BGI) in multi-objective bandits, introducing two problem variants: GPSI (Pareto-optimal group identification) and LBGI (linear objective with known weights). It proposes elimination-based algorithms—Triple Elimination (TE) for GPSI and Equal Effect Confidence Bound (EECB) for LBGI—with instance-dependent sample- complexity guarantees and accompanying lower bounds, demonstrating near-optimal performance in several regimes. The methods are validated through extensive experiments showing substantial gains over naive baselines and weight-agnostic approaches. The work advances pure-exploration in multi-objective bandits and highlights practical strategies for efficiently identifying optimal group configurations under fixed-confidence, with potential extensions to partially known weight information.

Abstract

We introduce the Best Group Identification problem in a multi-objective multi-armed bandit setting, where an agent interacts with groups of arms with vector-valued rewards. The performance of a group is determined by an efficiency vector which represents the group's best attainable rewards across different dimensions. The objective is to identify the set of optimal groups in the fixed-confidence setting. We investigate two key formulations: group Pareto set identification, where efficiency vectors of optimal groups are Pareto optimal and linear best group identification, where each reward dimension has a known weight and the optimal group maximizes the weighted sum of its efficiency vector's entries. For both settings, we propose elimination-based algorithms, establish upper bounds on their sample complexity, and derive lower bounds that apply to any correct algorithm. Through numerical experiments, we demonstrate the strong empirical performance of the proposed algorithms.

Best Group Identification in Multi-Objective Bandits

TL;DR

The paper tackles best group identification (BGI) in multi-objective bandits, introducing two problem variants: GPSI (Pareto-optimal group identification) and LBGI (linear objective with known weights). It proposes elimination-based algorithms—Triple Elimination (TE) for GPSI and Equal Effect Confidence Bound (EECB) for LBGI—with instance-dependent sample- complexity guarantees and accompanying lower bounds, demonstrating near-optimal performance in several regimes. The methods are validated through extensive experiments showing substantial gains over naive baselines and weight-agnostic approaches. The work advances pure-exploration in multi-objective bandits and highlights practical strategies for efficiently identifying optimal group configurations under fixed-confidence, with potential extensions to partially known weight information.

Abstract

We introduce the Best Group Identification problem in a multi-objective multi-armed bandit setting, where an agent interacts with groups of arms with vector-valued rewards. The performance of a group is determined by an efficiency vector which represents the group's best attainable rewards across different dimensions. The objective is to identify the set of optimal groups in the fixed-confidence setting. We investigate two key formulations: group Pareto set identification, where efficiency vectors of optimal groups are Pareto optimal and linear best group identification, where each reward dimension has a known weight and the optimal group maximizes the weighted sum of its efficiency vector's entries. For both settings, we propose elimination-based algorithms, establish upper bounds on their sample complexity, and derive lower bounds that apply to any correct algorithm. Through numerical experiments, we demonstrate the strong empirical performance of the proposed algorithms.

Paper Structure

This paper contains 29 sections, 24 theorems, 110 equations, 1 figure, 1 table, 2 algorithms.

Key Result

Theorem 3.3

For any environment $\mathcal{V}$ with arm means tensor $\boldsymbol{\mu}$, any $\epsilon > 0$ and $\delta \in (0,1)$, Algorithm algo: te with parameter $\beta(r, \delta)$ defined in eq: beta is $(\epsilon, \delta)$-PAC. Moreover, with probability $1 - \delta$ the total number of arm pulls is at mos where $C$ is a universal constant and $\tilde{\Delta}_{i,j} \triangleq \max(\Delta_{i,j}, \Delta_i,

Figures (1)

  • Figure 1: Results of the experiments for the GPSI problem.

Theorems & Definitions (50)

  • Definition 2.1: Domination
  • Definition 3.1: $\epsilon$-Pareto
  • Definition 3.2: Dimension Resolution
  • Theorem 3.3: Correctness and Upper Bound for Algorithm \ref{['algo: te']}
  • Remark 3.4
  • Theorem 3.5: GPSI Lower Bound
  • Theorem 4.1: Correctness and Upper Bound for Algorithm \ref{['algo: eecb']}
  • Theorem 4.2: LBGI Lower Bound
  • Remark 4.3
  • Theorem B.1: Correctness and Upper Bound for Algorithm \ref{['algo: te']}
  • ...and 40 more