Table of Contents
Fetching ...

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

TL;DR

This work tackles generalized low-rank matrix bandits under a GLM setting by proposing two scalable, two-stage frameworks, G-ESTT and G-ESTS. Stage 1 uses Stein's method to estimate a low-rank subspace of the unknown parameter matrix $\Theta^*$ via nuclear-norm regularization, yielding a high-probability Frobenius error bound. In Stage 2, G-ESTT rotates the problem into a transformed GLM bandit and applies LowGLM-UCB, while G-ESTS performs a subspace-exclusion reduction to a smaller GLM bandit solvable by fast algorithms like SGD-TS. Theoretical regret guarantees show favorable bounds that scale with problem dimensions and rank, and experiments confirm both effectiveness and computational efficiency, highlighting practical advantages for high-dimensional, structured bandit problems.

Abstract

In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $Θ^*$ with rank $r \ll \{d_1, d_2\}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the $\tilde{O}(\sqrt{(d_1+d_2)MrT})$ bound of regret while G-ESTS can achineve the $\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T})$ bound of regret under mild assumption up to logarithm terms, where $M$ is some problem dependent value. Under a reasonable assumption that $M = O((d_1+d_2)^2)$ in our problem setting, the regret of G-ESTT is consistent with the current best regret of $\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})$~\citep{lu2021low} ($D_{rr}$ will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

TL;DR

This work tackles generalized low-rank matrix bandits under a GLM setting by proposing two scalable, two-stage frameworks, G-ESTT and G-ESTS. Stage 1 uses Stein's method to estimate a low-rank subspace of the unknown parameter matrix via nuclear-norm regularization, yielding a high-probability Frobenius error bound. In Stage 2, G-ESTT rotates the problem into a transformed GLM bandit and applies LowGLM-UCB, while G-ESTS performs a subspace-exclusion reduction to a smaller GLM bandit solvable by fast algorithms like SGD-TS. Theoretical regret guarantees show favorable bounds that scale with problem dimensions and rank, and experiments confirm both effectiveness and computational efficiency, highlighting practical advantages for high-dimensional, structured bandit problems.

Abstract

In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown by matrix with rank , and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the bound of regret while G-ESTS can achineve the bound of regret under mild assumption up to logarithm terms, where is some problem dependent value. Under a reasonable assumption that in our problem setting, the regret of G-ESTT is consistent with the current best regret of ~\citep{lu2021low} ( will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.
Paper Structure (45 sections, 23 theorems, 144 equations, 2 figures, 2 tables, 6 algorithms)

This paper contains 45 sections, 23 theorems, 144 equations, 2 figures, 2 tables, 6 algorithms.

Key Result

Theorem 4.1

(Bounds for GLM) For any low-rank generalized linear model with samples $X_1\dots,X_{T_1}$ drawn from $\mathcal{X}$ according to $\mathcal{D}$ in Assumption assu_sampling, and assume Assumption assu_bound and assu_link hold, then for the optimal solution to the nuclear norm regularization problem lo with probability at least $1-\delta$ it holds that: for $C_1=36(4\sigma_0^2+S_f^2)$ and some nonze

Figures (2)

  • Figure 1: Plots of regret curves of algorithm G-ESTS, G-ESTT, SGD-TS and LowESTR under four settings ($480$ arms). (a): diagonal $\Theta^*$$d_1=d_2=10,r=1$; (b): diagonal $\Theta^*$$d_1=d_2=12,r=1$; (c): non-diagonal $\Theta^*$$d_1=d_2=10,r=2$; (d): non-diagonal $\Theta^*$$d_1=d_2=12,r=2$.
  • Figure 2: Plots of regret curves of algorithm G-ESTT, G-ESTS, SGD-TS and LowESTR under four settings ($1000$ arms). (a): diagonal $\Theta^*$$d_1=d_2=10,r=1$; (b): diagonal $\Theta^*$$d_1=d_2=12,r=1$; (c): non-diagonal $\Theta^*$$d_1=d_2=10,r=2$; (d): non-diagonal $\Theta^*$$d_1=d_2=12,r=2$.

Theorems & Definitions (27)

  • Definition 3.1
  • Definition 3.2
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Lemma A.1
  • Lemma B.1
  • Lemma B.2
  • Lemma B.3
  • Lemma B.4
  • ...and 17 more