Contextual Multinomial Logit Bandits with General Value Functions

Mengxiao Zhang; Haipeng Luo

Contextual Multinomial Logit Bandits with General Value Functions

Mengxiao Zhang, Haipeng Luo

TL;DR

This work considers contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits, and proposes a suite of algorithms, each with different computation-regret trade-off.

Abstract

Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. However, prior work has only considered (generalized) linear value functions, which greatly limits its applicability. Motivated by this fact, in this work, we consider contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits. Specifically, we consider both the stochastic and the adversarial settings, and propose a suite of algorithms, each with different computation-regret trade-off. When applied to the linear case, our results not only are the first ones with no dependence on a certain problem-dependent constant that can be exponentially large, but also enjoy other advantages such as computational efficiency, dimension-free regret bounds, or the ability to handle completely adversarial contexts and rewards.

Contextual Multinomial Logit Bandits with General Value Functions

TL;DR

Abstract

Paper Structure (24 sections, 36 theorems, 105 equations, 1 table, 1 algorithm)

This paper contains 24 sections, 36 theorems, 105 equations, 1 table, 1 algorithm.

Introduction
Contributions.
Related works.
Notations and Preliminary
Notations.
Contextual MNL Bandits with Stochastic Contexts and Rewards
A Simple and Efficient Algorithm via Uniform Exploration
Better Exploration Leads to Better Regret
Contextual MNL Bandits with Adversarial Contexts and Rewards
First Approach: Reduction to Online Regression
Uniform Exploration.
Better Exploration.
Second Approach: Feel-Good Thompson Sampling
Additional Related Works
Omitted Details in sec:MNL_sto
...and 9 more sections

Key Result

Lemma 0

The ERM strategy $\widehat{f}_D=\mathop{\mathrm{argmin}}\limits_{f\in{\mathcal{F}}}\sum_{(x,S,i)\in D}\ell_{\log}(\mu(S, f(x)), i)$ satisfies asm:gen_error for the following two cases:

Theorems & Definitions (56)

Lemma 0
Lemma 0
Lemma 0
theorem 1
Corollary 2
Lemma 2
theorem 3
Corollary 4
Lemma 4
Lemma 4
...and 46 more

Contextual Multinomial Logit Bandits with General Value Functions

TL;DR

Abstract

Contextual Multinomial Logit Bandits with General Value Functions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (56)