Multi-Player Approaches for Dueling Bandits

Or Raveh; Junya Honda; Masashi Sugiyama

Multi-Player Approaches for Dueling Bandits

Or Raveh, Junya Honda, Masashi Sugiyama

TL;DR

This work addresses learning in a distributed dueling-bandit setting with multiple cooperating players and a Condorcet Winner. It introduces two complementary algorithms: a Follow Your Leader Black Box (FYLBB) that can leverage any single-player dueling-bandit base algorithm, and a fully distributed message-passing RUCB (MP-RUCB) that uses CW recommendations to accelerate exploration. The authors prove an asymptotic lower bound of order $ ext{O}(K \, ext{log} \, T)$ that is independent of the number of players $M$, and they show both algorithms achieve matching upper bounds, with fast non-asymptotic CW identification in the distributed setting. Experiments on real preference data demonstrate that multiplayer approaches outperform single-player baselines, highlighting the gains from cooperative exploration and CW-driven communication in noninvasive, preference-based feedback scenarios such as ranking or tuning large models.

Abstract

Various approaches have emerged for multi-armed bandits in distributed systems. The multiplayer dueling bandit problem, common in scenarios with only preference-based information like human feedback, introduces challenges related to controlling collaborative exploration of non-informative arm pairs, but has received little attention. To fill this gap, we demonstrate that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting when utilizing known dueling bandit algorithms as a foundation. Additionally, we analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol, resulting in expedited exploration in many cases. Our experimental comparisons reveal that our multiplayer algorithms surpass single-player benchmark algorithms, underscoring their efficacy in addressing the nuanced challenges of the multiplayer dueling bandit setting.

Multi-Player Approaches for Dueling Bandits

TL;DR

that is independent of the number of players

, and they show both algorithms achieve matching upper bounds, with fast non-asymptotic CW identification in the distributed setting. Experiments on real preference data demonstrate that multiplayer approaches outperform single-player baselines, highlighting the gains from cooperative exploration and CW-driven communication in noninvasive, preference-based feedback scenarios such as ranking or tuning large models.

Abstract

Paper Structure (24 sections, 13 theorems, 99 equations, 14 figures, 3 algorithms)

This paper contains 24 sections, 13 theorems, 99 equations, 14 figures, 3 algorithms.

INTRODUCTION
PROBLEM FORMULATION
ASYMPTOTIC LOWER BOUND
FOLLOW YOUR LEADER BLACK BOX ALGORITHM
Proof Outline
A FULLY DISTRIBUTED APPROACH
Non-dominant Regret Term
Proof Outline
EXPERIMENTS
Algorithmic Comparisons Across Datasets
Experiments with a Varying Number of Players
Comparisons with a Single Player
Comparisons Across Different Graph Structures
CONCLUSION AND FUTURE WORK
RELATED WORK
...and 9 more sections

Key Result

Theorem 3.1

For any consistent algorithm on $\mathcal{Q}_{\mathrm{CW}}$ and $Q \in \mathcal{Q}_{\mathrm{CW}}$, the group regret obeys,

Figures (14)

Figure 1: Six Rankers
Figure 2: Sushi
Figure 3: Irish
Figure 5: Message Passing RUCB
Figure 6: Follow Your Leader RUCB
...and 9 more figures

Theorems & Definitions (31)

Theorem 3.1
Theorem 4.2
Corollary 4.3
Lemma 5.1
Theorem 5.2
Remark 5.3
Definition C.1
Definition C.2
Definition C.4
Definition C.5
...and 21 more

Multi-Player Approaches for Dueling Bandits

TL;DR

Abstract

Multi-Player Approaches for Dueling Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (31)