Table of Contents
Fetching ...

Contextual Multinomial Logit Bandits with General Value Functions

Mengxiao Zhang, Haipeng Luo

TL;DR

This work considers contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits, and proposes a suite of algorithms, each with different computation-regret trade-off.

Abstract

Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. However, prior work has only considered (generalized) linear value functions, which greatly limits its applicability. Motivated by this fact, in this work, we consider contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits. Specifically, we consider both the stochastic and the adversarial settings, and propose a suite of algorithms, each with different computation-regret trade-off. When applied to the linear case, our results not only are the first ones with no dependence on a certain problem-dependent constant that can be exponentially large, but also enjoy other advantages such as computational efficiency, dimension-free regret bounds, or the ability to handle completely adversarial contexts and rewards.

Contextual Multinomial Logit Bandits with General Value Functions

TL;DR

This work considers contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits, and proposes a suite of algorithms, each with different computation-regret trade-off.

Abstract

Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. However, prior work has only considered (generalized) linear value functions, which greatly limits its applicability. Motivated by this fact, in this work, we consider contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits. Specifically, we consider both the stochastic and the adversarial settings, and propose a suite of algorithms, each with different computation-regret trade-off. When applied to the linear case, our results not only are the first ones with no dependence on a certain problem-dependent constant that can be exponentially large, but also enjoy other advantages such as computational efficiency, dimension-free regret bounds, or the ability to handle completely adversarial contexts and rewards.
Paper Structure (24 sections, 36 theorems, 105 equations, 1 table, 1 algorithm)

This paper contains 24 sections, 36 theorems, 105 equations, 1 table, 1 algorithm.

Key Result

Lemma 0

The ERM strategy $\widehat{f}_D=\mathop{\mathrm{argmin}}\limits_{f\in{\mathcal{F}}}\sum_{(x,S,i)\in D}\ell_{\log}(\mu(S, f(x)), i)$ satisfies asm:gen_error for the following two cases:

Theorems & Definitions (56)

  • Lemma 0
  • Lemma 0
  • Lemma 0
  • theorem 1
  • Corollary 2
  • Lemma 2
  • theorem 3
  • Corollary 4
  • Lemma 4
  • Lemma 4
  • ...and 46 more