Table of Contents
Fetching ...

Provable Multi-Task Reinforcement Learning: A Representation Learning Framework with Low Rank Rewards

Yaoze Guo, Shana Moothedath

Abstract

Multi-task representation learning (MTRL) is an approach that learns shared latent representations across related tasks, facilitating collaborative learning that improves the overall learning efficiency. This paper studies MTRL for multi-task reinforcement learning (RL), where multiple tasks have the same state-action space and transition probabilities, but different rewards. We consider T linear Markov Decision Processes (MDPs) where the reward functions and transition dynamics admit linear feature embeddings of dimension d. The relatedness among the tasks is captured by a low-rank structure on the reward matrices. Learning shared representations across multiple RL tasks is challenging due to the complex and policy-dependent nature of data that leads to a temporal progression of error. Our approach adopts a reward-free reinforcement learning framework to first learn a data-collection policy. This policy then informs an exploration strategy for estimating the unknown reward matrices. Importantly, the data collected under this well-designed policy enable accurate estimation, which ultimately supports the learning of an near-optimal policy. Unlike existing approaches that rely on restrictive assumptions such as Gaussian features, incoherence conditions, or access to optimal solutions, we propose a low-rank matrix estimation method that operates under more general feature distributions encountered in RL settings. Theoretical analysis establishes that accurate low-rank matrix recovery is achievable under these relaxed assumptions, and we characterize the relationship between representation error and sample complexity. Leveraging the learned representation, we construct near-optimal policies and prove a regret bound. Experimental results demonstrate that our method effectively learns robust shared representations and task dynamics from finite data.

Provable Multi-Task Reinforcement Learning: A Representation Learning Framework with Low Rank Rewards

Abstract

Multi-task representation learning (MTRL) is an approach that learns shared latent representations across related tasks, facilitating collaborative learning that improves the overall learning efficiency. This paper studies MTRL for multi-task reinforcement learning (RL), where multiple tasks have the same state-action space and transition probabilities, but different rewards. We consider T linear Markov Decision Processes (MDPs) where the reward functions and transition dynamics admit linear feature embeddings of dimension d. The relatedness among the tasks is captured by a low-rank structure on the reward matrices. Learning shared representations across multiple RL tasks is challenging due to the complex and policy-dependent nature of data that leads to a temporal progression of error. Our approach adopts a reward-free reinforcement learning framework to first learn a data-collection policy. This policy then informs an exploration strategy for estimating the unknown reward matrices. Importantly, the data collected under this well-designed policy enable accurate estimation, which ultimately supports the learning of an near-optimal policy. Unlike existing approaches that rely on restrictive assumptions such as Gaussian features, incoherence conditions, or access to optimal solutions, we propose a low-rank matrix estimation method that operates under more general feature distributions encountered in RL settings. Theoretical analysis establishes that accurate low-rank matrix recovery is achievable under these relaxed assumptions, and we characterize the relationship between representation error and sample complexity. Leveraging the learned representation, we construct near-optimal policies and prove a regret bound. Experimental results demonstrate that our method effectively learns robust shared representations and task dynamics from finite data.

Paper Structure

This paper contains 16 sections, 9 theorems, 53 equations, 2 figures, 1 algorithm.

Key Result

Proposition 1

(Wedin sin$\Theta$ Theorem) chen2021spectral. For two matrices $M^\star, M \in \mathbb{R}^{n_1 \times n_2}$, let $B^\star, B \in \mathbb{R}^{n_1 \times r}$ denote the matrices containing their top-$r$ left singular vectors, and let $V^{\star}, V \in \mathbb{R}^{r \times n_2}$ denote the matrices con Furthermore, if $\|M-M^{\star}\| \leqslant (1-1/\sqrt{2})(\sigma_r^\star - \sigma_{r+1}^\star)$, th

Figures (2)

  • Figure E1: Results for Experiment 1: We set $d=100, T=100,r=2,|\pazocal{S}|=1000,|\pazocal{A}|=10.$ Random policy baseline replaces stage 2 in Algorithm \ref{['alg:whole']} with a random policy that chooses each action uniformly. MoM baseline replaces stage 3 in Algorithm \ref{['alg:whole']} with an MoM estimator. Our proposed approach outperforms both the baselines.
  • Figure E2: Results for Experiment 1: Figure \ref{['fig:2a']} presents regret plots for Experiment 1 of the proposed algorithm against the three baseline approaches. Results for Experiment 2: Figure \ref{['fig:2b']} and \ref{['fig:2c']} present results for the grid maze environment. The five goal points for five tasks are (1,1), (2,2), (3,3), (4,4), (5,5), respectively. Figure \ref{['fig:2b']} presents subspace distance vs. number of samples $K$. Fig. \ref{['fig:2c']} presents the estimation error vs. number of samples $K$. Estimation error is defined as $\|\widehat{\Theta}_h-\Theta^\star_h\|_\text{F}/\|\Theta^\star_h\|_\text{F}$.

Theorems & Definitions (18)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • proof
  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • Lemma 2
  • proof
  • ...and 8 more