Contextual Combinatorial Bandits with Probabilistically Triggered Arms

Xutong Liu; Jinhang Zuo; Siwei Wang; John C. S. Lui; Mohammad Hajiesmaili; Adam Wierman; Wei Chen

Contextual Combinatorial Bandits with Probabilistically Triggered Arms

Xutong Liu, Jinhang Zuo, Siwei Wang, John C. S. Lui, Mohammad Hajiesmaili, Adam Wierman, Wei Chen

TL;DR

A novel analysis technique and variance-adaptive algorithm that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound and can be applied to the CMAB-T and C$2$MAB setting, improving existing results there as well.

Abstract

We study contextual combinatorial bandits with probabilistically triggered arms (C$^2$MAB-T) under a variety of smoothness conditions that capture a wide range of applications, such as contextual cascading bandits and contextual influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novel analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the dimension of contexts, $p_{\min}$ is the minimum positive probability that any arm can be triggered, and batch-size $K$ is the maximum number of arms that can be triggered per round. Under the variance modulated (VM) or triggering probability and variance modulated (TPVM) conditions, we propose a new variance-adaptive algorithm VAC$^2$-UCB and derive a regret bound $\tilde{O}(d\sqrt{T})$, which is independent of the batch-size $K$. As a valuable by-product, our analysis technique and variance-adaptive algorithm can be applied to the CMAB-T and C$^2$MAB setting, improving existing results there as well. We also include experiments that demonstrate the improved performance of our algorithms compared with benchmark algorithms on synthetic and real-world datasets.

Contextual Combinatorial Bandits with Probabilistically Triggered Arms

TL;DR

A novel analysis technique and variance-adaptive algorithm that achieves an

regret bound and can be applied to the CMAB-T and C

MAB setting, improving existing results there as well.

Abstract

We study contextual combinatorial bandits with probabilistically triggered arms (C

MAB-T) under a variety of smoothness conditions that capture a wide range of applications, such as contextual cascading bandits and contextual influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise the C

-UCB-T algorithm and propose a novel analysis that achieves an

regret bound, removing a potentially exponentially large factor

, where

is the dimension of contexts,

is the minimum positive probability that any arm can be triggered, and batch-size

is the maximum number of arms that can be triggered per round. Under the variance modulated (VM) or triggering probability and variance modulated (TPVM) conditions, we propose a new variance-adaptive algorithm VAC

-UCB and derive a regret bound

, which is independent of the batch-size

. As a valuable by-product, our analysis technique and variance-adaptive algorithm can be applied to the CMAB-T and C

MAB setting, improving existing results there as well. We also include experiments that demonstrate the improved performance of our algorithms compared with benchmark algorithms on synthetic and real-world datasets.

Paper Structure (33 sections, 67 equations, 2 figures, 3 tables, 2 algorithms)

This paper contains 33 sections, 67 equations, 2 figures, 3 tables, 2 algorithms.

Introduction
Problem Setting
Key Quantities and Conditions
Algorithm and Regret Analysis for C$^2$MAB-T under the TPM Condition
Variance-Adaptive Algorithm and Analysis for C$^2$MAB-T under VM/TPVM Condition
Results and Analysis under VM condition
Results and Analysis under TPVM Condition
Applications and Experiments
Conclusion
Proofs for C$^2$MAB-T under the TPM Condition (Section \ref{['sec:c2mab']})
Proof of Theorem \ref{['thm:c2mab_reg']}
Important Lemmas used for proving Theorem \ref{['thm:c2mab_reg']}
Proofs for C$^2$MAB-T under the VM or TPVM Condition (Section \ref{['sec:va_c2mab']})
Proof of Lemma \ref{['lem:va_c2mab_good_est']}
Proof of Theorem \ref{['thm:ccmab_reg_VM']} under VM condition
...and 18 more sections

Figures (2)

Figure 1: Regret results for MovieLens data.
Figure 2: Results for synthetic data

Theorems & Definitions (19)

proof
proof
proof
proof
proof
proof
proof
proof
proof
proof : Proof of \ref{['apdx_lem:reduction_key']}
...and 9 more

Contextual Combinatorial Bandits with Probabilistically Triggered Arms

TL;DR

Abstract

Contextual Combinatorial Bandits with Probabilistically Triggered Arms

Authors

TL;DR

Abstract

Table of Contents

Figures (2)

Theorems & Definitions (19)