Table of Contents
Fetching ...

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

Jiabin Lin, Shana Moothedath, Namrata Vaswani

TL;DR

This work addresses fast, sample-efficient learning for $T$ contextual linear bandits that share a common $r$-dimensional representation. It introduces LRRL-AltGDMin, an alternating gradient-descent and minimization algorithm with spectral initialization to recover the low-rank feature matrix $\Theta^* = B^* W^*$, and provides regret guarantees and sample/time complexity analyses under iid Gaussian design and incoherence. Theoretical results include exponential error decay across epochs and a regret bound of $\mathcal{R}_{N,T} \le 2 \mu \sigma_{\max}^* \sqrt{r N T \log(1/\delta)} (1 + \log\log N)$, with NSR-dependent initialization and convergence conditions. Empirical evaluations on synthetic data and MNIST demonstrate that LRRL-AltGDMin outperforms MoM, trace-norm relaxations, and naive per-task Thompson sampling, especially as the number of tasks $T$ grows, highlighting the practical impact of shared low-rank representations for multi-task bandits.

Abstract

We study how representation learning can improve the learning efficiency of contextual bandit problems. We study the setting where we play T contextual linear bandits with dimension d simultaneously, and these T bandit tasks collectively share a common linear representation with a dimensionality of r much smaller than d. We present a new algorithm based on alternating projected gradient descent (GD) and minimization estimator to recover a low-rank feature matrix. Using the proposed estimator, we present a multi-task learning algorithm for linear contextual bandits and prove the regret bound of our algorithm. We presented experiments and compared the performance of our algorithm against benchmark algorithms.

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

TL;DR

This work addresses fast, sample-efficient learning for contextual linear bandits that share a common -dimensional representation. It introduces LRRL-AltGDMin, an alternating gradient-descent and minimization algorithm with spectral initialization to recover the low-rank feature matrix , and provides regret guarantees and sample/time complexity analyses under iid Gaussian design and incoherence. Theoretical results include exponential error decay across epochs and a regret bound of , with NSR-dependent initialization and convergence conditions. Empirical evaluations on synthetic data and MNIST demonstrate that LRRL-AltGDMin outperforms MoM, trace-norm relaxations, and naive per-task Thompson sampling, especially as the number of tasks grows, highlighting the practical impact of shared low-rank representations for multi-task bandits.

Abstract

We study how representation learning can improve the learning efficiency of contextual bandit problems. We study the setting where we play T contextual linear bandits with dimension d simultaneously, and these T bandit tasks collectively share a common linear representation with a dimensionality of r much smaller than d. We present a new algorithm based on alternating projected gradient descent (GD) and minimization estimator to recover a low-rank feature matrix. Using the proposed estimator, we present a multi-task learning algorithm for linear contextual bandits and prove the regret bound of our algorithm. We presented experiments and compared the performance of our algorithm against benchmark algorithms.
Paper Structure (18 sections, 12 theorems, 91 equations, 2 figures, 3 algorithms)

This paper contains 18 sections, 12 theorems, 91 equations, 2 figures, 3 algorithms.

Key Result

Theorem 5.1

Assume that Assumptions assume:iid and assume:incoherence hold. Assume that $\sigma_\eta^2 \leqslant c \frac{\delta_0^2}{r^2 \kappa^4 {\pazocal{G}}_1} \| \theta_t^\star \|^2$. Then with probability at least $1 - \exp(\log T - c {\pazocal{G}}_1) - \exp(d - \frac{c \delta_0^2 {\pazocal{G}}_1 T}{r^2 \m

Figures (2)

  • Figure 1: Synthetic data 1: We set the parameters as $d = 100$, $K = 5$, $N=200$, and noise variance $= 10^{-6}$. We considered $M=4$ epochs each with $50$ data samples each. We varied the number of tasks as $T=10,25,50,75,100$. We also varied the rank of the feature matrix as $r=2,4,8$. As shown in the plots (Figures \ref{['fig:1']}, \ref{['fig:2']}, and \ref{['fig:3']}), our proposed approach outperforms the existing benchmarks. MNIST data: Parameters are $d = 784$, $K = 2$, $N=5000$, and noise variance $= 10^{-6}$. We considered $M=5$ epochs each with $1000$ data samples each. We varied the number of tasks as $T=10, 45$. We also varied the rank of the feature matrix as $r=2,4,8$. The plots for MNIST data are presented in Figures \ref{['fig:4']}, \ref{['fig:5']}, and \ref{['fig:6']}. Synthetic data 2: We consider a smaller problem dimension here and also compare with the trace-norm relaxation method. In Figures \ref{['fig:7']}, \ref{['fig:8']}, and \ref{['fig:9']}, we set $d = 20$, $K = 5$, $N=40$. We considered $M=4$ epochs each with $10$ data samples each, thus $N=40$.
  • Figure 2: Synthetic data 1: In Figures \ref{['fig:est1']} and \ref{['fig:est2']}, we set the parameters as $d = 100, T=100$, $K = 5$, $N=200$, and noise variance $= 10^{-6}$. We ran for $L=2000$ GD iterations. We considered $M=4$ epochs each with $50$ data samples each. In Figure \ref{['fig:est5']}, we separately present the per-task regret vs. number of task plot for $d=100$, $K = 5$, $N=100$ (also shown in figure \ref{['fig:9']}) to showcase the sublinear decay. Synthetic data 2: We consider a smaller problem dimension and also compare with the trace-norm relaxation method. In Figures \ref{['fig:est3']} and \ref{['fig:est4']}, we set the parameters as $d = 20, T=30$, $K = 5$, $N=40$, and noise variance $= 10^{-6}$. We ran for $L=2000$ GD iterations. We considered $M=4$ epochs each with $10$ data samples each, thus $N=40$. As expected, the estimation error for our proposed algorithm saturates close to the noise.

Theorems & Definitions (18)

  • Theorem 5.1
  • Theorem 5.2
  • Theorem 5.3
  • Theorem 5.4
  • Proposition 1.1: Theorem 2.8.1, vershynin2018high
  • Proposition 1.2: Chernoff bound for Gaussian
  • Proposition 1.3: Epsilon-netting for bounding $\max_{z \in S_d, v \in S_r} |z^\top M v|$
  • proof
  • Proposition 2.1
  • proof
  • ...and 8 more