Table of Contents
Fetching ...

Ranking In Generalized Linear Bandits

Amitis Shidani, George Deligiannidis, Arnaud Doucet

TL;DR

The paper tackles ranking in generalized linear bandits where rewards depend on both items and their positions. It introduces an L-layered graph framework that maps ordered lists to paths, turning ranking into a longest-path problem on a DAG, and develops RankUCB and RankTS (with genRankUCB variants) to learn optimistic edge weights under partial information about position-specific parameters. Theoretical guarantees show a regret of order $\widetilde{O}(L \sqrt{d T})$, accounting for both item similarities and position dependencies. Empirical results demonstrate that modeling position effects improves performance over baselines that ignore such dependencies, indicating practical impact for recommendation and ranking systems.

Abstract

We study the ranking problem in generalized linear bandits. At each time, the learning agent selects an ordered list of items and observes stochastic outcomes. In recommendation systems, displaying an ordered list of the most attractive items is not always optimal as both position and item dependencies result in a complex reward function. A very naive example is the lack of diversity when all the most attractive items are from the same category. We model the position and item dependencies in the ordered list and design UCB and Thompson Sampling type algorithms for this problem. Our work generalizes existing studies in several directions, including position dependencies where position discount is a particular case, and connecting the ranking problem to graph theory.

Ranking In Generalized Linear Bandits

TL;DR

The paper tackles ranking in generalized linear bandits where rewards depend on both items and their positions. It introduces an L-layered graph framework that maps ordered lists to paths, turning ranking into a longest-path problem on a DAG, and develops RankUCB and RankTS (with genRankUCB variants) to learn optimistic edge weights under partial information about position-specific parameters. Theoretical guarantees show a regret of order , accounting for both item similarities and position dependencies. Empirical results demonstrate that modeling position effects improves performance over baselines that ignore such dependencies, indicating practical impact for recommendation and ranking systems.

Abstract

We study the ranking problem in generalized linear bandits. At each time, the learning agent selects an ordered list of items and observes stochastic outcomes. In recommendation systems, displaying an ordered list of the most attractive items is not always optimal as both position and item dependencies result in a complex reward function. A very naive example is the lack of diversity when all the most attractive items are from the same category. We model the position and item dependencies in the ordered list and design UCB and Thompson Sampling type algorithms for this problem. Our work generalizes existing studies in several directions, including position dependencies where position discount is a particular case, and connecting the ranking problem to graph theory.
Paper Structure (14 sections, 8 theorems, 65 equations, 4 figures, 1 table, 4 algorithms)

This paper contains 14 sections, 8 theorems, 65 equations, 4 figures, 1 table, 4 algorithms.

Key Result

Lemma 1

Let $\delta \in (0,1)$, and $\sqrt{\beta_t^l} = \sqrt{\lambda} \|\theta^l\|_2 + \sqrt{2\log\left(\frac{1}{\delta}\right) + \log\left(\frac{\det\left(V_t^l(\lambda)\right)}{\lambda^d}\right)}$. Define $\mathcal{C}_t^l$ as follows: Then, with probability at least $1-\delta$, it holds that for any time $t$, $\theta^l \in \mathcal{C}_t^l$; i.e. $\mathbb{P}(\exists t: \theta^l \notin \mathcal{C}_t^l)

Figures (4)

  • Figure 1: An illustration of a Valid (\ref{['fig:graph-valid']}) and an Invalid (\ref{['fig:graph-invalid']}) $3$-Layered Graph. The graph in \ref{['fig:graph-invalid']} is invalid due to the red edges that violate conditions (a) and (b) of Definition \ref{['def']}.
  • Figure 2: Expected regret for $K = 100$. Left:$w_l = 0\; \; \forall l \in [L]$, Right:$\max_{l \in [L]} |w_l| = 10$.
  • Figure 3: Expected Regret for $d = 10$, and $L = 4$.
  • Figure 4: Robustness of Algorithms in Presence of Non-Subgaussian Noise. $\max_{l \in [L]} |w_l| = 1$, and $K = 10$

Theorems & Definitions (15)

  • Definition 1
  • Lemma 1
  • Theorem 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • proof
  • Theorem 2
  • ...and 5 more