Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits
Jiabin Lin, Shana Moothedath, Namrata Vaswani
TL;DR
This work addresses fast, sample-efficient learning for $T$ contextual linear bandits that share a common $r$-dimensional representation. It introduces LRRL-AltGDMin, an alternating gradient-descent and minimization algorithm with spectral initialization to recover the low-rank feature matrix $\Theta^* = B^* W^*$, and provides regret guarantees and sample/time complexity analyses under iid Gaussian design and incoherence. Theoretical results include exponential error decay across epochs and a regret bound of $\mathcal{R}_{N,T} \le 2 \mu \sigma_{\max}^* \sqrt{r N T \log(1/\delta)} (1 + \log\log N)$, with NSR-dependent initialization and convergence conditions. Empirical evaluations on synthetic data and MNIST demonstrate that LRRL-AltGDMin outperforms MoM, trace-norm relaxations, and naive per-task Thompson sampling, especially as the number of tasks $T$ grows, highlighting the practical impact of shared low-rank representations for multi-task bandits.
Abstract
We study how representation learning can improve the learning efficiency of contextual bandit problems. We study the setting where we play T contextual linear bandits with dimension d simultaneously, and these T bandit tasks collectively share a common linear representation with a dimensionality of r much smaller than d. We present a new algorithm based on alternating projected gradient descent (GD) and minimization estimator to recover a low-rank feature matrix. Using the proposed estimator, we present a multi-task learning algorithm for linear contextual bandits and prove the regret bound of our algorithm. We presented experiments and compared the performance of our algorithm against benchmark algorithms.
