Sequential Learning of the Pareto Front for Multi-objective Bandits

Elise Crépon; Aurélien Garivier; Wouter M Koolen

Sequential Learning of the Pareto Front for Multi-objective Bandits

Elise Crépon, Aurélien Garivier, Wouter M Koolen

TL;DR

The paper targets fast, fixed-confidence identification of the Pareto front in multi-objective bandits with rewards in $\mathbb{R}^d$ and $K$ arms, aiming for success probability $1-\delta$. It adapts the Track-and-Stop framework to Pareto-front identification and optimizes the inner, information-theoretic lower-bound problem via online gradient ascent, focusing on Gaussian arms with identity covariance. A key contribution is the decomposition of the transport cost into removing or adding Pareto points and a cell-based algorithm that achieves instance-optimal sample complexity with per-round cost on the order of $O(K p^d)$; the authors derive explicit costs for removal and develop a tractable procedure for addition, including a cell enumeration strategy with a combinatorial bound $\binom{p+d-1}{d-1}$. Empirical results on real and synthetic data demonstrate substantial improvements in sample efficiency and provide practical insights into scalability for moderate $p$ and $d$.

Abstract

We study the problem of sequential learning of the Pareto front in multi-objective multi-armed bandits. An agent is faced with K possible arms to pull. At each turn she picks one, and receives a vector-valued reward. When she thinks she has enough information to identify the Pareto front of the different arm means, she stops the game and gives an answer. We are interested in designing algorithms such that the answer given is correct with probability at least 1-$δ$. Our main contribution is an efficient implementation of an algorithm achieving the optimal sample complexity when the risk $δ$ is small. With K arms in d dimensions p of which are in the Pareto set, the algorithm runs in time O(Kp^d) per round.

Sequential Learning of the Pareto Front for Multi-objective Bandits

TL;DR

The paper targets fast, fixed-confidence identification of the Pareto front in multi-objective bandits with rewards in

and

arms, aiming for success probability

. It adapts the Track-and-Stop framework to Pareto-front identification and optimizes the inner, information-theoretic lower-bound problem via online gradient ascent, focusing on Gaussian arms with identity covariance. A key contribution is the decomposition of the transport cost into removing or adding Pareto points and a cell-based algorithm that achieves instance-optimal sample complexity with per-round cost on the order of

; the authors derive explicit costs for removal and develop a tractable procedure for addition, including a cell enumeration strategy with a combinatorial bound

. Empirical results on real and synthetic data demonstrate substantial improvements in sample efficiency and provide practical insights into scalability for moderate

and

Abstract

. Our main contribution is an efficient implementation of an algorithm achieving the optimal sample complexity when the risk

is small. With K arms in d dimensions p of which are in the Pareto set, the algorithm runs in time O(Kp^d) per round.

Sequential Learning of the Pareto Front for Multi-objective Bandits

TL;DR

Abstract

Sequential Learning of the Pareto Front for Multi-objective Bandits

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (19)