Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

Yue Kang; Cho-Jui Hsieh; Thomas C. M. Lee

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

TL;DR

This work tackles generalized low-rank matrix bandits under a GLM setting by proposing two scalable, two-stage frameworks, G-ESTT and G-ESTS. Stage 1 uses Stein's method to estimate a low-rank subspace of the unknown parameter matrix $\Theta^*$ via nuclear-norm regularization, yielding a high-probability Frobenius error bound. In Stage 2, G-ESTT rotates the problem into a transformed GLM bandit and applies LowGLM-UCB, while G-ESTS performs a subspace-exclusion reduction to a smaller GLM bandit solvable by fast algorithms like SGD-TS. Theoretical regret guarantees show favorable bounds that scale with problem dimensions and rank, and experiments confirm both effectiveness and computational efficiency, highlighting practical advantages for high-dimensional, structured bandit problems.

Abstract

In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $Θ^*$ with rank $r \ll \{d_1, d_2\}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the $\tilde{O}(\sqrt{(d_1+d_2)MrT})$ bound of regret while G-ESTS can achineve the $\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T})$ bound of regret under mild assumption up to logarithm terms, where $M$ is some problem dependent value. Under a reasonable assumption that $M = O((d_1+d_2)^2)$ in our problem setting, the regret of G-ESTT is consistent with the current best regret of $\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})$~\citep{lu2021low} ($D_{rr}$ will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

TL;DR

via nuclear-norm regularization, yielding a high-probability Frobenius error bound. In Stage 2, G-ESTT rotates the problem into a transformed GLM bandit and applies LowGLM-UCB, while G-ESTS performs a subspace-exclusion reduction to a smaller GLM bandit solvable by fast algorithms like SGD-TS. Theoretical regret guarantees show favorable bounds that scale with problem dimensions and rank, and experiments confirm both effectiveness and computational efficiency, highlighting practical advantages for high-dimensional, structured bandit problems.

Abstract

In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown

matrix

with rank

, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the

bound of regret while G-ESTS can achineve the

bound of regret under mild assumption up to logarithm terms, where

is some problem dependent value. Under a reasonable assumption that

in our problem setting, the regret of G-ESTT is consistent with the current best regret of

~\citep{lu2021low} (

will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.

Paper Structure (45 sections, 23 theorems, 144 equations, 2 figures, 2 tables, 6 algorithms)

This paper contains 45 sections, 23 theorems, 144 equations, 2 figures, 2 tables, 6 algorithms.

Introduction
Related Work
Preliminaries
Main Results
Stage 1: Subspace Exploration
Stage 2 of G-ESTT
Overall regret of G-ESTT
Stage 2 of G-ESTS
Overall regret of G-ESTS
Experiments
Conclusion
Clarification about $\sigma_0^2$
Proof of Theorem \ref{['thm_rscbound']}
Useful Lemmas
Proof of Theorem \ref{['thm_rscbound']}
...and 30 more sections

Key Result

Theorem 4.1

(Bounds for GLM) For any low-rank generalized linear model with samples $X_1\dots,X_{T_1}$ drawn from $\mathcal{X}$ according to $\mathcal{D}$ in Assumption assu_sampling, and assume Assumption assu_bound and assu_link hold, then for the optimal solution to the nuclear norm regularization problem lo with probability at least $1-\delta$ it holds that: for $C_1=36(4\sigma_0^2+S_f^2)$ and some nonze

Figures (2)

Figure 1: Plots of regret curves of algorithm G-ESTS, G-ESTT, SGD-TS and LowESTR under four settings ($480$ arms). (a): diagonal $\Theta^*$$d_1=d_2=10,r=1$; (b): diagonal $\Theta^*$$d_1=d_2=12,r=1$; (c): non-diagonal $\Theta^*$$d_1=d_2=10,r=2$; (d): non-diagonal $\Theta^*$$d_1=d_2=12,r=2$.
Figure 2: Plots of regret curves of algorithm G-ESTT, G-ESTS, SGD-TS and LowESTR under four settings ($1000$ arms). (a): diagonal $\Theta^*$$d_1=d_2=10,r=1$; (b): diagonal $\Theta^*$$d_1=d_2=12,r=1$; (c): non-diagonal $\Theta^*$$d_1=d_2=10,r=2$; (d): non-diagonal $\Theta^*$$d_1=d_2=12,r=2$.

Theorems & Definitions (27)

Definition 3.1
Definition 3.2
Theorem 4.1
Theorem 4.2
Theorem 4.3
Lemma A.1
Lemma B.1
Lemma B.2
Lemma B.3
Lemma B.4
...and 17 more

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

TL;DR

Abstract

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (27)