High-Dimensional Linear Bandits under Stochastic Latent Heterogeneity
Elynn Chen, Xi Chen, Wenbo Jing, Xiao Liu
TL;DR
This work introduces a latent heterogeneous bandit framework to address online decision-making when responses depend on unobserved subgroups in addition to observable context. It develops a phased EM–greedy algorithm that jointly learns latent subgroup probabilities and high-dimensional, group-specific reward parameters, with minimax-optimal guarantees for learning and classification. A fundamental stochastic barrier is shown: even with perfect parameter knowledge, strong regret must grow linearly due to irreducible subgroup-realization randomness, while regular regret achieves a minimax-optimal sublinear rate. Empirical evaluations on simulations and real cash-bonus data demonstrate the practical benefits of incorporating latent heterogeneity for promotion targeting and related sequential personalization problems.
Abstract
This paper addresses the critical challenge of stochastic latent heterogeneity in online decision-making, where individuals' responses to actions vary not only with observable contexts but also with unobserved, randomly realized subgroups. Existing data-driven approaches largely capture observable heterogeneity through contextual features but fail when the sources of variation are latent and stochastic. We propose a latent heterogeneous bandit framework that explicitly models probabilistic subgroup membership and group-specific reward functions, using promotion targeting as a motivating example. Our phased EM-greedy algorithm jointly learns latent group probabilities and reward parameters in high dimensions, achieving optimal estimation and classification guarantees. Our analysis reveals a new phenomenon unique to decision-making with stochastic latent subgroups: randomness in group realizations creates irreducible classification uncertainty, making sub-linear regret against a fully informed strong oracle fundamentally impossible. We establish matching upper and minimax lower bounds for both the strong and regular regrets, corresponding, respectively, to oracles with and without access to realized group memberships. The strong regret necessarily grows linearly, while the regular regret achieves a minimax-optimal sublinear rate. These findings uncover a fundamental stochastic barrier in online decision-making and point to potential remedies through simple strategic interventions and mechanism-design-based elicitation of latent information.
