Table of Contents
Fetching ...

Learning Shared Representations for Multi-Task Linear Bandits

Jiabin Lin, Shana Moothedath

Abstract

Multi-task representation learning is an approach that learns shared latent representations across related tasks, facilitating knowledge transfer and improving sample efficiency. This paper introduces a novel approach to multi-task representation learning in linear bandits. We consider a setting with T concurrent linear bandit tasks, each with feature dimension d, that share a common latent representation of dimension r \ll min{d,T}$, capturing their underlying relatedness. We propose a new Optimism in the Face of Uncertainty Linear (OFUL) algorithm that leverages shared low-rank representations to enhance decision-making in a sample-efficient manner. Our algorithm first collects data through an exploration phase, estimates the shared model via spectral initialization, and then conducts OFUL based learning over a newly constructed confidence set. We provide theoretical guarantees for the confidence set and prove that the unknown reward vectors lie within the confidence set with high probability. We derive cumulative regret bounds and show that the proposed approach achieves \tilde{O}(\sqrt{drNT}), a significant improvement over solving the T tasks independently, resulting in a regret of \tilde{O}(dT\sqrt{N}). We performed numerical simulations to validate the performance of our algorithm for different problem sizes.

Learning Shared Representations for Multi-Task Linear Bandits

Abstract

Multi-task representation learning is an approach that learns shared latent representations across related tasks, facilitating knowledge transfer and improving sample efficiency. This paper introduces a novel approach to multi-task representation learning in linear bandits. We consider a setting with T concurrent linear bandit tasks, each with feature dimension d, that share a common latent representation of dimension r \ll min{d,T}$, capturing their underlying relatedness. We propose a new Optimism in the Face of Uncertainty Linear (OFUL) algorithm that leverages shared low-rank representations to enhance decision-making in a sample-efficient manner. Our algorithm first collects data through an exploration phase, estimates the shared model via spectral initialization, and then conducts OFUL based learning over a newly constructed confidence set. We provide theoretical guarantees for the confidence set and prove that the unknown reward vectors lie within the confidence set with high probability. We derive cumulative regret bounds and show that the proposed approach achieves \tilde{O}(\sqrt{drNT}), a significant improvement over solving the T tasks independently, resulting in a regret of \tilde{O}(dT\sqrt{N}). We performed numerical simulations to validate the performance of our algorithm for different problem sizes.

Paper Structure

This paper contains 9 sections, 3 theorems, 30 equations, 1 figure, 1 algorithm.

Key Result

Proposition C.1

Pick a $\delta_0 < 0.1$. Define the noise-to-signal ratio as $\mathrm{NSR} := \frac{T \sigma^2}{\sigma_{\min}^{\star^2}}$. If $N_1 T > C \mu^2 \kappa^2 \left(d r \frac{\kappa^2}{\delta_0^2} + \frac{d}{\delta_0^2} \mathrm{NSR}\right)$, then with probability at least $1-\exp(-c(d+T))$, subspace distan

Figures (1)

  • Figure 3: Plots present the per-task cumulative regret vs. round. Figure \ref{['fig:1']} presents plot varying the feature dimension $d$ as $\{100, 20, 300\}$. Figure \ref{['fig:2']} presents plot by varying rank $r$ as $\{2,4,8\}$. Figure \ref{['fig:3']} presents plot varying the number of tasks $T$ as $\{200, 400, 800\}$. The parameters are set as $d=100, T=100, r=2, N_1=20, N=600$.

Theorems & Definitions (6)

  • Proposition C.1: Theorem 2.2, singh2024noisy
  • Theorem C.2
  • proof
  • Theorem C.3
  • proof
  • Definition 1.1