Efficient Generalized Low-Rank Tensor Contextual Bandits
Qianxin Yi, Yiyang Yang, Shaojie Tang, Jiapeng Liu, Yao Wang
TL;DR
The paper tackles sequential decision-making under contextual bandits with high-dimensional, multiway action features and nonlinear rewards. It introduces a generalized low-rank tensor contextual bandit model using a transformed t-product and a tubal-rank parameter $\mathcal{W}^*$, and proposes G-LowTESTR, a two-stage algorithm that first explores a low-rank subspace and then refines decisions via a projected generalized linear bandit (LowGLM-UCB). Theoretical results establish a local restricted convexity-based analysis and a regret bound of $\widetilde{O}\left(\frac{d^2\sqrt{\ell T}}{\sqrt{a}(1-\gamma)}\right)$, showing improvements over vectorization and matricization. Empirical results on synthetic data and real-world domains (precision medicine and online advertising) demonstrate the practical value of leveraging low-rank tensor structure to achieve better learning efficiency and decision quality in nonlinear reward settings.
Abstract
In this paper, we aim to build a novel bandits algorithm that is capable of fully harnessing the power of multi-dimensional data and the inherent non-linearity of reward functions to provide high-usable and accountable decision-making services. To this end, we introduce a generalized low-rank tensor contextual bandits model in which an action is formed from three feature vectors, and thus can be represented by a tensor. In this formulation, the reward is determined through a generalized linear function applied to the inner product of the action's feature tensor and a fixed but unknown parameter tensor with a low tubal rank. To effectively achieve the trade-off between exploration and exploitation, we introduce a novel algorithm called "Generalized Low-Rank Tensor Exploration Subspace then Refine" (G-LowTESTR). This algorithm first collects raw data to explore the intrinsic low-rank tensor subspace information embedded in the decision-making scenario, and then converts the original problem into an almost lower-dimensional generalized linear contextual bandits problem. Rigorous theoretical analysis shows that the regret bound of G-LowTESTR is superior to those in vectorization and matricization cases. We conduct a series of simulations and real data experiments to further highlight the effectiveness of G-LowTESTR, leveraging its ability to capitalize on the low-rank tensor structure for enhanced learning.
