Batched Online Contextual Sparse Bandits with Sequential Inclusion of Features

Rowan Swiers; Subash Prabanantham; Andrew Maher

Batched Online Contextual Sparse Bandits with Sequential Inclusion of Features

Rowan Swiers, Subash Prabanantham, Andrew Maher

TL;DR

This work tackles contextual MABs with linear rewards under sparsity and batched feedback, introducing Online Batched Sequential Inclusion (OBSI) to exclude irrelevant features from decision-making. OBSI sequentially includes features only after sufficient confidence that they influence the reward, using a threshold based on $ rac{\sum_A |\hat{\theta}_{t,A}^i|}{\sqrt{\sum_A \mathrm{Var}(\hat{\theta}_{t,A}^i)}} > \Phi^{-1}(\alpha)$ and online posterior updates with $B_{t,A}$ and $\hat{\theta}_{t,A}$ to share information across actions. The approach defines a fairness regret to quantify the impact of irrelevant features on action choices and demonstrates through synthetic experiments that OBSI reduces both regret and fairness regret while offering faster compute times than competing sparse-bandit methods. Empirically, OBSI outperforms baselines like LBGL and MCPB in key metrics and remains fully online, highlighting its practical relevance for fair, efficient online decision-making in sparse, batched contexts. The work also notes that sequential feature inclusion can be adapted to other posterior-based bandits, broadening its potential impact on online learning systems.

Abstract

Multi-armed Bandits (MABs) are increasingly employed in online platforms and e-commerce to optimize decision making for personalized user experiences. In this work, we focus on the Contextual Bandit problem with linear rewards, under conditions of sparsity and batched data. We address the challenge of fairness by excluding irrelevant features from decision-making processes using a novel algorithm, Online Batched Sequential Inclusion (OBSI), which sequentially includes features as confidence in their impact on the reward increases. Our experiments on synthetic data show the superior performance of OBSI compared to other algorithms in terms of regret, relevance of features used, and compute.

Batched Online Contextual Sparse Bandits with Sequential Inclusion of Features

TL;DR

and online posterior updates with

and

to share information across actions. The approach defines a fairness regret to quantify the impact of irrelevant features on action choices and demonstrates through synthetic experiments that OBSI reduces both regret and fairness regret while offering faster compute times than competing sparse-bandit methods. Empirically, OBSI outperforms baselines like LBGL and MCPB in key metrics and remains fully online, highlighting its practical relevance for fair, efficient online decision-making in sparse, batched contexts. The work also notes that sequential feature inclusion can be adapted to other posterior-based bandits, broadening its potential impact on online learning systems.

Abstract

Paper Structure (13 sections, 5 equations, 4 figures, 1 table, 4 algorithms)

This paper contains 13 sections, 5 equations, 4 figures, 1 table, 4 algorithms.

Introduction and Motivation
Related Work
Method
Overview
Formulation
Algorithm
Experiments
Conclusion
Appendix
Comparison of Regret Evolution
Comparison of Alpha Values
Comparison of Different Dimensions
Comparison Algorithm PseudoCode

Figures (4)

Figure 1: Regret of bandit algorithms
Figure 2: Regret for different alpha values
Figure 3: Fairness Regret for different alpha values
Figure 4: Regret for different dimensions

Batched Online Contextual Sparse Bandits with Sequential Inclusion of Features

TL;DR

Abstract

Batched Online Contextual Sparse Bandits with Sequential Inclusion of Features

Authors

TL;DR

Abstract

Table of Contents

Figures (4)