Ranking In Generalized Linear Bandits

Amitis Shidani; George Deligiannidis; Arnaud Doucet

Ranking In Generalized Linear Bandits

Amitis Shidani, George Deligiannidis, Arnaud Doucet

TL;DR

The paper tackles ranking in generalized linear bandits where rewards depend on both items and their positions. It introduces an L-layered graph framework that maps ordered lists to paths, turning ranking into a longest-path problem on a DAG, and develops RankUCB and RankTS (with genRankUCB variants) to learn optimistic edge weights under partial information about position-specific parameters. Theoretical guarantees show a regret of order $\widetilde{O}(L \sqrt{d T})$, accounting for both item similarities and position dependencies. Empirical results demonstrate that modeling position effects improves performance over baselines that ignore such dependencies, indicating practical impact for recommendation and ranking systems.

Abstract

We study the ranking problem in generalized linear bandits. At each time, the learning agent selects an ordered list of items and observes stochastic outcomes. In recommendation systems, displaying an ordered list of the most attractive items is not always optimal as both position and item dependencies result in a complex reward function. A very naive example is the lack of diversity when all the most attractive items are from the same category. We model the position and item dependencies in the ordered list and design UCB and Thompson Sampling type algorithms for this problem. Our work generalizes existing studies in several directions, including position dependencies where position discount is a particular case, and connecting the ranking problem to graph theory.

Ranking In Generalized Linear Bandits

TL;DR

, accounting for both item similarities and position dependencies. Empirical results demonstrate that modeling position effects improves performance over baselines that ignore such dependencies, indicating practical impact for recommendation and ranking systems.

Abstract

Paper Structure (14 sections, 8 theorems, 65 equations, 4 figures, 1 table, 4 algorithms)

This paper contains 14 sections, 8 theorems, 65 equations, 4 figures, 1 table, 4 algorithms.

Introduction
Our Contribution
Notation and Setting
The Graph-Based Approach for Ranking
Ranking UCB Algorithm
Experiments
Conclusion
RankUCB Proofs
Proof of Lemma \ref{['lem:genlinconf']}
Proof of Theorem \ref{['thm:genlinucb']}
Generalization of RankUCB: Estimating Position Dependencies
Ranking Thompson Sampling Algorithm
More Details On Experiments
Dependency on a Window of Previous Items

Key Result

Lemma 1

Let $\delta \in (0,1)$, and $\sqrt{\beta_t^l} = \sqrt{\lambda} \|\theta^l\|_2 + \sqrt{2\log\left(\frac{1}{\delta}\right) + \log\left(\frac{\det\left(V_t^l(\lambda)\right)}{\lambda^d}\right)}$. Define $\mathcal{C}_t^l$ as follows: Then, with probability at least $1-\delta$, it holds that for any time $t$, $\theta^l \in \mathcal{C}_t^l$; i.e. $\mathbb{P}(\exists t: \theta^l \notin \mathcal{C}_t^l)

Figures (4)

Figure 1: An illustration of a Valid (\ref{['fig:graph-valid']}) and an Invalid (\ref{['fig:graph-invalid']}) $3$-Layered Graph. The graph in \ref{['fig:graph-invalid']} is invalid due to the red edges that violate conditions (a) and (b) of Definition \ref{['def']}.
Figure 2: Expected regret for $K = 100$. Left:$w_l = 0\; \; \forall l \in [L]$, Right:$\max_{l \in [L]} |w_l| = 10$.
Figure 3: Expected Regret for $d = 10$, and $L = 4$.
Figure 4: Robustness of Algorithms in Presence of Non-Subgaussian Noise. $\max_{l \in [L]} |w_l| = 1$, and $K = 10$

Theorems & Definitions (15)

Definition 1
Lemma 1
Theorem 1
proof
Lemma 2
proof
Lemma 3
proof
proof
Theorem 2
...and 5 more

Ranking In Generalized Linear Bandits

TL;DR

Abstract

Ranking In Generalized Linear Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (15)