Navigating Sparsities in High-Dimensional Linear Contextual Bandits
Rui Zhao, Zihan Chen, Zemin Zheng
TL;DR
This work tackles high-dimensional linear contextual bandits by identifying two sparsity structures—sparse model parameters and sparse eigenvalues of context covariances—and introducing a flexible PointWise Estimator (PWE) to adaptively handle both. Building on PWE, the HOPE algorithm employs an Explore-Then-Commit scheme to achieve sublinear regret across four heterogeneous scenarios, integrating Lasso and RDL as initial estimators where appropriate. Theoretical analyses provide a universal regret bound and detailed rates for each scenario, demonstrating HOPE’s ability to match or surpass prior methods in homogeneous settings and to excel in mixed sparsity contexts where existing approaches fail. Empirical results corroborate these findings, showing HOPE’s robust performance across diverse covariance and sparsity regimes and confirming its practical significance for high-dimensional contextual decision-making.
Abstract
High-dimensional linear contextual bandit problems remain a significant challenge due to the curse of dimensionality. Existing methods typically consider either the model parameters to be sparse or the eigenvalues of context covariance matrices to be (approximately) sparse, lacking general applicability due to the rigidity of conventional reward estimators. To overcome this limitation, a powerful pointwise estimator is introduced in this work that adaptively navigates both kinds of sparsity. Based on this pointwise estimator, a novel algorithm, termed HOPE, is proposed. Theoretical analyses demonstrate that HOPE not only achieves improved regret bounds in previously discussed homogeneous settings (i.e., considering only one type of sparsity) but also, for the first time, efficiently handles two new challenging heterogeneous settings (i.e., considering a mixture of two types of sparsity), highlighting its flexibility and generality. Experiments corroborate the superiority of HOPE over existing methods across various scenarios.
