Efficient Generalized Low-Rank Tensor Contextual Bandits

Qianxin Yi; Yiyang Yang; Shaojie Tang; Jiapeng Liu; Yao Wang

Efficient Generalized Low-Rank Tensor Contextual Bandits

Qianxin Yi, Yiyang Yang, Shaojie Tang, Jiapeng Liu, Yao Wang

TL;DR

The paper tackles sequential decision-making under contextual bandits with high-dimensional, multiway action features and nonlinear rewards. It introduces a generalized low-rank tensor contextual bandit model using a transformed t-product and a tubal-rank parameter $\mathcal{W}^*$, and proposes G-LowTESTR, a two-stage algorithm that first explores a low-rank subspace and then refines decisions via a projected generalized linear bandit (LowGLM-UCB). Theoretical results establish a local restricted convexity-based analysis and a regret bound of $\widetilde{O}\left(\frac{d^2\sqrt{\ell T}}{\sqrt{a}(1-\gamma)}\right)$, showing improvements over vectorization and matricization. Empirical results on synthetic data and real-world domains (precision medicine and online advertising) demonstrate the practical value of leveraging low-rank tensor structure to achieve better learning efficiency and decision quality in nonlinear reward settings.

Abstract

In this paper, we aim to build a novel bandits algorithm that is capable of fully harnessing the power of multi-dimensional data and the inherent non-linearity of reward functions to provide high-usable and accountable decision-making services. To this end, we introduce a generalized low-rank tensor contextual bandits model in which an action is formed from three feature vectors, and thus can be represented by a tensor. In this formulation, the reward is determined through a generalized linear function applied to the inner product of the action's feature tensor and a fixed but unknown parameter tensor with a low tubal rank. To effectively achieve the trade-off between exploration and exploitation, we introduce a novel algorithm called "Generalized Low-Rank Tensor Exploration Subspace then Refine" (G-LowTESTR). This algorithm first collects raw data to explore the intrinsic low-rank tensor subspace information embedded in the decision-making scenario, and then converts the original problem into an almost lower-dimensional generalized linear contextual bandits problem. Rigorous theoretical analysis shows that the regret bound of G-LowTESTR is superior to those in vectorization and matricization cases. We conduct a series of simulations and real data experiments to further highlight the effectiveness of G-LowTESTR, leveraging its ability to capitalize on the low-rank tensor structure for enhanced learning.

Efficient Generalized Low-Rank Tensor Contextual Bandits

TL;DR

, and proposes G-LowTESTR, a two-stage algorithm that first explores a low-rank subspace and then refines decisions via a projected generalized linear bandit (LowGLM-UCB). Theoretical results establish a local restricted convexity-based analysis and a regret bound of

, showing improvements over vectorization and matricization. Empirical results on synthetic data and real-world domains (precision medicine and online advertising) demonstrate the practical value of leveraging low-rank tensor structure to achieve better learning efficiency and decision quality in nonlinear reward settings.

Abstract

Paper Structure (25 sections, 10 theorems, 89 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 25 sections, 10 theorems, 89 equations, 5 figures, 1 table, 2 algorithms.

Introduction
Applications of generalized low-rank tensor contextual bandit
Binary rewards in precision medicine
Poisson (count-based) rewards in online advertising
Contributions
Modeling paradigm
Algorithmic paradigm
Theoretical implications
Practical implications
Organization of the paper
Related work
The generalized low-rank tensor contextual bandits model
Efficient algorithm for the generalized low-rank tensor contextual bandits
The G-LowTESTR algorithm
Explore the low-rank subspace
...and 10 more sections

Key Result

Theorem 1

Figures (5)

Figure 1: Motivating binary rewards example: precise drug recommendation
Figure 2: Motivating Poisson rewards example: online advertising
Figure 3: Illustration of G-LowTESTR
Figure 4: Comparison analysis with other algorithms in synthetic data
Figure 5: Comparison analysis with other algorithms in real data

Theorems & Definitions (27)

Definition 1: Local restricted strong convexity (LRSC).
Theorem 1: Converence under LRSC
Corollary 1
Corollary 2
Theorem 2: Regret of G-LowTESTR
Remark 1
Definition 2: Transformed t-product; kernfeld2015tensor
Definition 3: Conjugate transpose; kernfeld2015tensor
Definition 4: Identity tensor; kernfeld2015tensor
Definition 5: Unitary tensor; kernfeld2015tensor
...and 17 more

Efficient Generalized Low-Rank Tensor Contextual Bandits

TL;DR

Abstract

Efficient Generalized Low-Rank Tensor Contextual Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (27)