Table of Contents
Fetching ...

Sequential Learning of the Pareto Front for Multi-objective Bandits

Elise Crépon, Aurélien Garivier, Wouter M Koolen

TL;DR

The paper targets fast, fixed-confidence identification of the Pareto front in multi-objective bandits with rewards in $\mathbb{R}^d$ and $K$ arms, aiming for success probability $1-\delta$. It adapts the Track-and-Stop framework to Pareto-front identification and optimizes the inner, information-theoretic lower-bound problem via online gradient ascent, focusing on Gaussian arms with identity covariance. A key contribution is the decomposition of the transport cost into removing or adding Pareto points and a cell-based algorithm that achieves instance-optimal sample complexity with per-round cost on the order of $O(K p^d)$; the authors derive explicit costs for removal and develop a tractable procedure for addition, including a cell enumeration strategy with a combinatorial bound $\binom{p+d-1}{d-1}$. Empirical results on real and synthetic data demonstrate substantial improvements in sample efficiency and provide practical insights into scalability for moderate $p$ and $d$.

Abstract

We study the problem of sequential learning of the Pareto front in multi-objective multi-armed bandits. An agent is faced with K possible arms to pull. At each turn she picks one, and receives a vector-valued reward. When she thinks she has enough information to identify the Pareto front of the different arm means, she stops the game and gives an answer. We are interested in designing algorithms such that the answer given is correct with probability at least 1-$δ$. Our main contribution is an efficient implementation of an algorithm achieving the optimal sample complexity when the risk $δ$ is small. With K arms in d dimensions p of which are in the Pareto set, the algorithm runs in time O(Kp^d) per round.

Sequential Learning of the Pareto Front for Multi-objective Bandits

TL;DR

The paper targets fast, fixed-confidence identification of the Pareto front in multi-objective bandits with rewards in and arms, aiming for success probability . It adapts the Track-and-Stop framework to Pareto-front identification and optimizes the inner, information-theoretic lower-bound problem via online gradient ascent, focusing on Gaussian arms with identity covariance. A key contribution is the decomposition of the transport cost into removing or adding Pareto points and a cell-based algorithm that achieves instance-optimal sample complexity with per-round cost on the order of ; the authors derive explicit costs for removal and develop a tractable procedure for addition, including a cell enumeration strategy with a combinatorial bound . Empirical results on real and synthetic data demonstrate substantial improvements in sample efficiency and provide practical insights into scalability for moderate and .

Abstract

We study the problem of sequential learning of the Pareto front in multi-objective multi-armed bandits. An agent is faced with K possible arms to pull. At each turn she picks one, and receives a vector-valued reward. When she thinks she has enough information to identify the Pareto front of the different arm means, she stops the game and gives an answer. We are interested in designing algorithms such that the answer given is correct with probability at least 1-. Our main contribution is an efficient implementation of an algorithm achieving the optimal sample complexity when the risk is small. With K arms in d dimensions p of which are in the Pareto set, the algorithm runs in time O(Kp^d) per round.

Paper Structure

This paper contains 19 sections, 11 theorems, 42 equations, 6 figures, 3 tables.

Key Result

Proposition 1

Given a set of models $M$, a finite set of disjoint hypotheses $\mathcal{H} = (\mathcal{H}_i)_{i \in [n]}$ which is a partition of $M$ and a risk parameter $\delta>0$, any $\delta$-PAC strategy is such that for every $\nu \in M$: where

Figures (6)

  • Figure 1: The original model is drawn in turquoise (circle). We start by moving the point $0$ to a new location. Then we move the points that are still in its all-positive orthant outside of it with respect to the dimension where the move is the shortest (brown stars).
  • Figure 2: An example of cell construction in 2d with three valid cells and an empty one
  • Figure 3: Empirical distribution of the number of samples used to identify the two Pareto optimal points
  • Figure 4: Time to solve the minimization problem on a random point cloud with $p$ Pareto points in dim. $d$
  • Figure 5: Example of a point being added in dimension 3
  • ...and 1 more figures

Theorems & Definitions (19)

  • Proposition 1: Sample complexity lower bound
  • Theorem 1: Algorithmic complexity of the minimal transportation cost
  • Lemma 1: Splitting the domain
  • Lemma 2: Cost of removing a point from the Pareto set
  • proof
  • Lemma 3: Cost of adding a point to the Pareto set
  • Example
  • Lemma 4
  • Lemma 5
  • proof : Proof of Lemma \ref{['lem:splitting']}
  • ...and 9 more