Table of Contents
Fetching ...

High-Dimensional Linear Bandits under Stochastic Latent Heterogeneity

Elynn Chen, Xi Chen, Wenbo Jing, Xiao Liu

TL;DR

This work introduces a latent heterogeneous bandit framework to address online decision-making when responses depend on unobserved subgroups in addition to observable context. It develops a phased EM–greedy algorithm that jointly learns latent subgroup probabilities and high-dimensional, group-specific reward parameters, with minimax-optimal guarantees for learning and classification. A fundamental stochastic barrier is shown: even with perfect parameter knowledge, strong regret must grow linearly due to irreducible subgroup-realization randomness, while regular regret achieves a minimax-optimal sublinear rate. Empirical evaluations on simulations and real cash-bonus data demonstrate the practical benefits of incorporating latent heterogeneity for promotion targeting and related sequential personalization problems.

Abstract

This paper addresses the critical challenge of stochastic latent heterogeneity in online decision-making, where individuals' responses to actions vary not only with observable contexts but also with unobserved, randomly realized subgroups. Existing data-driven approaches largely capture observable heterogeneity through contextual features but fail when the sources of variation are latent and stochastic. We propose a latent heterogeneous bandit framework that explicitly models probabilistic subgroup membership and group-specific reward functions, using promotion targeting as a motivating example. Our phased EM-greedy algorithm jointly learns latent group probabilities and reward parameters in high dimensions, achieving optimal estimation and classification guarantees. Our analysis reveals a new phenomenon unique to decision-making with stochastic latent subgroups: randomness in group realizations creates irreducible classification uncertainty, making sub-linear regret against a fully informed strong oracle fundamentally impossible. We establish matching upper and minimax lower bounds for both the strong and regular regrets, corresponding, respectively, to oracles with and without access to realized group memberships. The strong regret necessarily grows linearly, while the regular regret achieves a minimax-optimal sublinear rate. These findings uncover a fundamental stochastic barrier in online decision-making and point to potential remedies through simple strategic interventions and mechanism-design-based elicitation of latent information.

High-Dimensional Linear Bandits under Stochastic Latent Heterogeneity

TL;DR

This work introduces a latent heterogeneous bandit framework to address online decision-making when responses depend on unobserved subgroups in addition to observable context. It develops a phased EM–greedy algorithm that jointly learns latent subgroup probabilities and high-dimensional, group-specific reward parameters, with minimax-optimal guarantees for learning and classification. A fundamental stochastic barrier is shown: even with perfect parameter knowledge, strong regret must grow linearly due to irreducible subgroup-realization randomness, while regular regret achieves a minimax-optimal sublinear rate. Empirical evaluations on simulations and real cash-bonus data demonstrate the practical benefits of incorporating latent heterogeneity for promotion targeting and related sequential personalization problems.

Abstract

This paper addresses the critical challenge of stochastic latent heterogeneity in online decision-making, where individuals' responses to actions vary not only with observable contexts but also with unobserved, randomly realized subgroups. Existing data-driven approaches largely capture observable heterogeneity through contextual features but fail when the sources of variation are latent and stochastic. We propose a latent heterogeneous bandit framework that explicitly models probabilistic subgroup membership and group-specific reward functions, using promotion targeting as a motivating example. Our phased EM-greedy algorithm jointly learns latent group probabilities and reward parameters in high dimensions, achieving optimal estimation and classification guarantees. Our analysis reveals a new phenomenon unique to decision-making with stochastic latent subgroups: randomness in group realizations creates irreducible classification uncertainty, making sub-linear regret against a fully informed strong oracle fundamentally impossible. We establish matching upper and minimax lower bounds for both the strong and regular regrets, corresponding, respectively, to oracles with and without access to realized group memberships. The strong regret necessarily grows linearly, while the regular regret achieves a minimax-optimal sublinear rate. These findings uncover a fundamental stochastic barrier in online decision-making and point to potential remedies through simple strategic interventions and mechanism-design-based elicitation of latent information.

Paper Structure

This paper contains 25 sections, 9 theorems, 158 equations, 3 figures, 2 algorithms.

Key Result

Theorem 1

Suppose Assumptions A1--A4 hold and $s^2\log d\log n_0 \lesssim n_0$. Let the initial estimators $\boldsymbol{\gamma}^{(\tau, 0)}=\widehat{\boldsymbol{\gamma}}^{(\tau-1)}$ for $\tau \geq 2$. Furthermore, select $t_{\tau, \max}\asymp \log n_0$ for $\tau=1$ and $t_{\tau, \max}\asymp 1$ for $\tau\geq 2 and with probability at least $1-d^{-1}$, where $(\widehat{\boldsymbol{\theta}}^{(\tau)},\widehat{

Figures (3)

  • Figure 1: Average strong and regular regrets with $s=20$, $d=\{500, 1000\}$ and $\overline{L}\in \{2.5, 5\}$. The horizontal axis "time" represents the sample size $T$.
  • Figure 2: Estimation errors of the parameters $( \boldsymbol{\theta}^*,\boldsymbol{\beta}_1^*, \boldsymbol{\beta}^*_2)$ with $s=20$, $\overline{L}=2.5$, and $d\in\{500, 1000\}$. The horizontal axis "time" represents the sample size $T$.
  • Figure 3: Average strong and regular regrets of different methods on the cash bonus dataset

Theorems & Definitions (14)

  • Remark 1
  • Remark 2: Initialization
  • Theorem 1
  • Remark 3
  • Theorem 2
  • Theorem 3
  • Remark 4
  • Theorem 4
  • Theorem 5
  • Lemma 6: Population EM iterates
  • ...and 4 more