Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

Jiabin Lin; Shana Moothedath; Namrata Vaswani

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

Jiabin Lin, Shana Moothedath, Namrata Vaswani

TL;DR

This work addresses fast, sample-efficient learning for $T$ contextual linear bandits that share a common $r$-dimensional representation. It introduces LRRL-AltGDMin, an alternating gradient-descent and minimization algorithm with spectral initialization to recover the low-rank feature matrix $\Theta^* = B^* W^*$, and provides regret guarantees and sample/time complexity analyses under iid Gaussian design and incoherence. Theoretical results include exponential error decay across epochs and a regret bound of $\mathcal{R}_{N,T} \le 2 \mu \sigma_{\max}^* \sqrt{r N T \log(1/\delta)} (1 + \log\log N)$, with NSR-dependent initialization and convergence conditions. Empirical evaluations on synthetic data and MNIST demonstrate that LRRL-AltGDMin outperforms MoM, trace-norm relaxations, and naive per-task Thompson sampling, especially as the number of tasks $T$ grows, highlighting the practical impact of shared low-rank representations for multi-task bandits.

Abstract

We study how representation learning can improve the learning efficiency of contextual bandit problems. We study the setting where we play T contextual linear bandits with dimension d simultaneously, and these T bandit tasks collectively share a common linear representation with a dimensionality of r much smaller than d. We present a new algorithm based on alternating projected gradient descent (GD) and minimization estimator to recover a low-rank feature matrix. Using the proposed estimator, we present a multi-task learning algorithm for linear contextual bandits and prove the regret bound of our algorithm. We presented experiments and compared the performance of our algorithm against benchmark algorithms.

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

TL;DR

This work addresses fast, sample-efficient learning for

contextual linear bandits that share a common

-dimensional representation. It introduces LRRL-AltGDMin, an alternating gradient-descent and minimization algorithm with spectral initialization to recover the low-rank feature matrix

, and provides regret guarantees and sample/time complexity analyses under iid Gaussian design and incoherence. Theoretical results include exponential error decay across epochs and a regret bound of

, with NSR-dependent initialization and convergence conditions. Empirical evaluations on synthetic data and MNIST demonstrate that LRRL-AltGDMin outperforms MoM, trace-norm relaxations, and naive per-task Thompson sampling, especially as the number of tasks

grows, highlighting the practical impact of shared low-rank representations for multi-task bandits.

Abstract

Paper Structure (18 sections, 12 theorems, 91 equations, 2 figures, 3 algorithms)

This paper contains 18 sections, 12 theorems, 91 equations, 2 figures, 3 algorithms.

Introduction
Problem Setting
Problem Formulation
Preliminaries
Contributions
Related Work
The Proposed Algorithm: LRRL-AltGDMin
Analysis of LRRL-AltGDMin
Simulations
Datasets
Results and Discussions
Conclusion and Future Work
Preliminaries
Guarantees for LRRL-AltGDMin Estimator
Proof of Theorem \ref{['new_3']}
...and 3 more sections

Key Result

Theorem 5.1

Assume that Assumptions assume:iid and assume:incoherence hold. Assume that $\sigma_\eta^2 \leqslant c \frac{\delta_0^2}{r^2 \kappa^4 {\pazocal{G}}_1} \| \theta_t^\star \|^2$. Then with probability at least $1 - \exp(\log T - c {\pazocal{G}}_1) - \exp(d - \frac{c \delta_0^2 {\pazocal{G}}_1 T}{r^2 \m

Figures (2)

Figure 1: Synthetic data 1: We set the parameters as $d = 100$, $K = 5$, $N=200$, and noise variance $= 10^{-6}$. We considered $M=4$ epochs each with $50$ data samples each. We varied the number of tasks as $T=10,25,50,75,100$. We also varied the rank of the feature matrix as $r=2,4,8$. As shown in the plots (Figures \ref{['fig:1']}, \ref{['fig:2']}, and \ref{['fig:3']}), our proposed approach outperforms the existing benchmarks. MNIST data: Parameters are $d = 784$, $K = 2$, $N=5000$, and noise variance $= 10^{-6}$. We considered $M=5$ epochs each with $1000$ data samples each. We varied the number of tasks as $T=10, 45$. We also varied the rank of the feature matrix as $r=2,4,8$. The plots for MNIST data are presented in Figures \ref{['fig:4']}, \ref{['fig:5']}, and \ref{['fig:6']}. Synthetic data 2: We consider a smaller problem dimension here and also compare with the trace-norm relaxation method. In Figures \ref{['fig:7']}, \ref{['fig:8']}, and \ref{['fig:9']}, we set $d = 20$, $K = 5$, $N=40$. We considered $M=4$ epochs each with $10$ data samples each, thus $N=40$.
Figure 2: Synthetic data 1: In Figures \ref{['fig:est1']} and \ref{['fig:est2']}, we set the parameters as $d = 100, T=100$, $K = 5$, $N=200$, and noise variance $= 10^{-6}$. We ran for $L=2000$ GD iterations. We considered $M=4$ epochs each with $50$ data samples each. In Figure \ref{['fig:est5']}, we separately present the per-task regret vs. number of task plot for $d=100$, $K = 5$, $N=100$ (also shown in figure \ref{['fig:9']}) to showcase the sublinear decay. Synthetic data 2: We consider a smaller problem dimension and also compare with the trace-norm relaxation method. In Figures \ref{['fig:est3']} and \ref{['fig:est4']}, we set the parameters as $d = 20, T=30$, $K = 5$, $N=40$, and noise variance $= 10^{-6}$. We ran for $L=2000$ GD iterations. We considered $M=4$ epochs each with $10$ data samples each, thus $N=40$. As expected, the estimation error for our proposed algorithm saturates close to the noise.

Theorems & Definitions (18)

Theorem 5.1
Theorem 5.2
Theorem 5.3
Theorem 5.4
Proposition 1.1: Theorem 2.8.1, vershynin2018high
Proposition 1.2: Chernoff bound for Gaussian
Proposition 1.3: Epsilon-netting for bounding $\max_{z \in S_d, v \in S_r} |z^\top M v|$
proof
Proposition 2.1
proof
...and 8 more

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

TL;DR

Abstract

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (18)