Non-Stationary Dueling Bandits Under a Weighted Borda Criterion

Joe Suk; Arpit Agarwal

Non-Stationary Dueling Bandits Under a Weighted Borda Criterion

Joe Suk, Arpit Agarwal

TL;DR

The paper addresses non-stationary dueling bandits with a focus on the Borda winner, introducing a Weighted Borda Score (WBS) framework that unifies Borda and Condorcet objectives. It proves the first adaptive dynamic regret bound for Borda, $\tilde{O}(\tilde{L}^{1/3} K^{1/3} T^{2/3})$, where $\tilde{L}$ counts significant Borda switches, and provides complementary $V_T$-dependent bounds, all without prior knowledge of non-stationarity. Under the General Identifiability Condition (GIC), it also derives adaptive Condorcet bounds in terms of $S_{approx}$ and $V_T$, demonstrating improved rates in regimes with many arms or many spurious winner changes. The algorithms BOSSE and METABOSSE implement the Weighted Borda approach, combining soft-elimination with time-varying exploration and a hierarchical replay mechanism to achieve adaptivity and optimality. Collectively, the work offers a unified, adaptive framework for both Borda and Condorcet dynamic regret and opens avenues for further refinement under weaker identifiability conditions.

Abstract

In $K$-armed dueling bandits, the learner receives preference feedback between arms, and the regret of an arm is defined in terms of its suboptimality to a $\textit{winner}$ arm. The $\textit{non-stationary}$ variant of the problem, motivated by concerns of changing user preferences, has received recent interest (Saha and Gupta, 2022; Buening and Saha, 2023; Suk and Agarwal, 2023). The goal here is to design algorithms with low {\em dynamic regret}, ideally without foreknowledge of the amount of change. The notion of regret here is tied to a notion of winner arm, most typically taken to be a so-called Condorcet winner or a Borda winner. However, the aforementioned results mostly focus on the Condorcet winner. In comparison, the Borda version of this problem has received less attention which is the focus of this work. We establish the first optimal and adaptive dynamic regret upper bound $\tilde{O}(\tilde{L}^{1/3} K^{1/3} T^{2/3} )$, where $\tilde{L}$ is the unknown number of significant Borda winner switches. We also introduce a novel $\textit{weighted Borda score}$ framework which generalizes both the Borda and Condorcet problems. This framework surprisingly allows a Borda-style regret analysis of the Condorcet problem and establishes improved bounds over the theoretical state-of-art in regimes with a large number of arms or many spurious changes in Condorcet winner. Such a generalization was not known and could be of independent interest.

Non-Stationary Dueling Bandits Under a Weighted Borda Criterion

TL;DR

, where

counts significant Borda switches, and provides complementary

-dependent bounds, all without prior knowledge of non-stationarity. Under the General Identifiability Condition (GIC), it also derives adaptive Condorcet bounds in terms of

and

, demonstrating improved rates in regimes with many arms or many spurious winner changes. The algorithms BOSSE and METABOSSE implement the Weighted Borda approach, combining soft-elimination with time-varying exploration and a hierarchical replay mechanism to achieve adaptivity and optimality. Collectively, the work offers a unified, adaptive framework for both Borda and Condorcet dynamic regret and opens avenues for further refinement under weaker identifiability conditions.

Abstract

-armed dueling bandits, the learner receives preference feedback between arms, and the regret of an arm is defined in terms of its suboptimality to a

arm. The

variant of the problem, motivated by concerns of changing user preferences, has received recent interest (Saha and Gupta, 2022; Buening and Saha, 2023; Suk and Agarwal, 2023). The goal here is to design algorithms with low {\em dynamic regret}, ideally without foreknowledge of the amount of change. The notion of regret here is tied to a notion of winner arm, most typically taken to be a so-called Condorcet winner or a Borda winner. However, the aforementioned results mostly focus on the Condorcet winner. In comparison, the Borda version of this problem has received less attention which is the focus of this work. We establish the first optimal and adaptive dynamic regret upper bound

, where

is the unknown number of significant Borda winner switches. We also introduce a novel

framework which generalizes both the Borda and Condorcet problems. This framework surprisingly allows a Borda-style regret analysis of the Condorcet problem and establishes improved bounds over the theoretical state-of-art in regimes with a large number of arms or many spurious changes in Condorcet winner. Such a generalization was not known and could be of independent interest.

Paper Structure (44 sections, 24 theorems, 144 equations, 4 figures, 2 tables)

This paper contains 44 sections, 24 theorems, 144 equations, 4 figures, 2 tables.

Introduction
Tabular Summary of Contributions
Setup -- Non-stationary Dueling Bandits
Non-Stationarity Measures
Dynamic Regret Lower Bounds
Dynamic Regret Upper Bounds
Borda Dueling Bandits
Condorcet Dueling Bandits
A New Unified View of Condorcet and Borda Regret
Algorithmic Design
Base Algorithm -- Soft Elimination with WBS
Non-Stationary Meta-Algorithm
Challenges of Condorcet Regret Analysis
Conclusion and Future Questions
Setting up the Weighted Borda Problem
...and 29 more sections

Key Result

Theorem 2

alg:meta with the fixed weight specification (see defn:specification) satisfies:

Figures (4)

Figure 1: GIC vs. SST, STI.
Figure 2: Glossary of Non-Stationarity Measures
Figure 3: ${\small\textsf{BOSSE}}\xspace(\mathop{\mathrm{\it{t}_{{\normalfont \text{start}}}}}\nolimits,m_0)$: (Weighted) BOrda Score Soft Elimination
Figure 4: Meta-BOSSE

Theorems & Definitions (65)

Definition 1
Theorem 2
Corollary 3
Definition 4: Approximate Winner Changes
Theorem 5
Remark 1
Definition 1: Significant Winner Switches w.r.t. Known Weightings (SKW)
Remark 2
Definition 2: Significant Winner Switches w.r.t. Unknown Weightings (SUW)
Remark 1
...and 55 more

Non-Stationary Dueling Bandits Under a Weighted Borda Criterion

TL;DR

Abstract

Non-Stationary Dueling Bandits Under a Weighted Borda Criterion

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (65)