Table of Contents
Fetching ...

Efficient Generalized Low-Rank Tensor Contextual Bandits

Qianxin Yi, Yiyang Yang, Shaojie Tang, Jiapeng Liu, Yao Wang

TL;DR

The paper tackles sequential decision-making under contextual bandits with high-dimensional, multiway action features and nonlinear rewards. It introduces a generalized low-rank tensor contextual bandit model using a transformed t-product and a tubal-rank parameter $\mathcal{W}^*$, and proposes G-LowTESTR, a two-stage algorithm that first explores a low-rank subspace and then refines decisions via a projected generalized linear bandit (LowGLM-UCB). Theoretical results establish a local restricted convexity-based analysis and a regret bound of $\widetilde{O}\left(\frac{d^2\sqrt{\ell T}}{\sqrt{a}(1-\gamma)}\right)$, showing improvements over vectorization and matricization. Empirical results on synthetic data and real-world domains (precision medicine and online advertising) demonstrate the practical value of leveraging low-rank tensor structure to achieve better learning efficiency and decision quality in nonlinear reward settings.

Abstract

In this paper, we aim to build a novel bandits algorithm that is capable of fully harnessing the power of multi-dimensional data and the inherent non-linearity of reward functions to provide high-usable and accountable decision-making services. To this end, we introduce a generalized low-rank tensor contextual bandits model in which an action is formed from three feature vectors, and thus can be represented by a tensor. In this formulation, the reward is determined through a generalized linear function applied to the inner product of the action's feature tensor and a fixed but unknown parameter tensor with a low tubal rank. To effectively achieve the trade-off between exploration and exploitation, we introduce a novel algorithm called "Generalized Low-Rank Tensor Exploration Subspace then Refine" (G-LowTESTR). This algorithm first collects raw data to explore the intrinsic low-rank tensor subspace information embedded in the decision-making scenario, and then converts the original problem into an almost lower-dimensional generalized linear contextual bandits problem. Rigorous theoretical analysis shows that the regret bound of G-LowTESTR is superior to those in vectorization and matricization cases. We conduct a series of simulations and real data experiments to further highlight the effectiveness of G-LowTESTR, leveraging its ability to capitalize on the low-rank tensor structure for enhanced learning.

Efficient Generalized Low-Rank Tensor Contextual Bandits

TL;DR

The paper tackles sequential decision-making under contextual bandits with high-dimensional, multiway action features and nonlinear rewards. It introduces a generalized low-rank tensor contextual bandit model using a transformed t-product and a tubal-rank parameter , and proposes G-LowTESTR, a two-stage algorithm that first explores a low-rank subspace and then refines decisions via a projected generalized linear bandit (LowGLM-UCB). Theoretical results establish a local restricted convexity-based analysis and a regret bound of , showing improvements over vectorization and matricization. Empirical results on synthetic data and real-world domains (precision medicine and online advertising) demonstrate the practical value of leveraging low-rank tensor structure to achieve better learning efficiency and decision quality in nonlinear reward settings.

Abstract

In this paper, we aim to build a novel bandits algorithm that is capable of fully harnessing the power of multi-dimensional data and the inherent non-linearity of reward functions to provide high-usable and accountable decision-making services. To this end, we introduce a generalized low-rank tensor contextual bandits model in which an action is formed from three feature vectors, and thus can be represented by a tensor. In this formulation, the reward is determined through a generalized linear function applied to the inner product of the action's feature tensor and a fixed but unknown parameter tensor with a low tubal rank. To effectively achieve the trade-off between exploration and exploitation, we introduce a novel algorithm called "Generalized Low-Rank Tensor Exploration Subspace then Refine" (G-LowTESTR). This algorithm first collects raw data to explore the intrinsic low-rank tensor subspace information embedded in the decision-making scenario, and then converts the original problem into an almost lower-dimensional generalized linear contextual bandits problem. Rigorous theoretical analysis shows that the regret bound of G-LowTESTR is superior to those in vectorization and matricization cases. We conduct a series of simulations and real data experiments to further highlight the effectiveness of G-LowTESTR, leveraging its ability to capitalize on the low-rank tensor structure for enhanced learning.
Paper Structure (25 sections, 10 theorems, 89 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 25 sections, 10 theorems, 89 equations, 5 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Figures (5)

  • Figure 1: Motivating binary rewards example: precise drug recommendation
  • Figure 2: Motivating Poisson rewards example: online advertising
  • Figure 3: Illustration of G-LowTESTR
  • Figure 4: Comparison analysis with other algorithms in synthetic data
  • Figure 5: Comparison analysis with other algorithms in real data

Theorems & Definitions (27)

  • Definition 1: Local restricted strong convexity (LRSC).
  • Theorem 1: Converence under LRSC
  • Corollary 1
  • Corollary 2
  • Theorem 2: Regret of G-LowTESTR
  • Remark 1
  • Definition 2: Transformed t-product; kernfeld2015tensor
  • Definition 3: Conjugate transpose; kernfeld2015tensor
  • Definition 4: Identity tensor; kernfeld2015tensor
  • Definition 5: Unitary tensor; kernfeld2015tensor
  • ...and 17 more