Table of Contents
Fetching ...

The Limits of Transfer Reinforcement Learning with Latent Low-rank Structure

Tyler Sam, Yudong Chen, Christina Lee Yu

TL;DR

This work considers the problem of transferring a latent low rank representation when the source and target MDPs have transition kernels with Tucker rank, and introduces the transfer-ability coefficient $\alpha$ that measures the difficulty of representational transfer.

Abstract

Many reinforcement learning (RL) algorithms are too costly to use in practice due to the large sizes $S, A$ of the problem's state and action space. To resolve this issue, we study transfer RL with latent low rank structure. We consider the problem of transferring a latent low rank representation when the source and target MDPs have transition kernels with Tucker rank $(S , d, A )$, $(S , S , d), (d, S, A )$, or $(d , d , d )$. In each setting, we introduce the transfer-ability coefficient $α$ that measures the difficulty of representational transfer. Our algorithm learns latent representations in each source MDP and then exploits the linear structure to remove the dependence on $S, A $, or $S A$ in the target MDP regret bound. We complement our positive results with information theoretic lower bounds that show our algorithms (excluding the ($d, d, d$) setting) are minimax-optimal with respect to $α$.

The Limits of Transfer Reinforcement Learning with Latent Low-rank Structure

TL;DR

This work considers the problem of transferring a latent low rank representation when the source and target MDPs have transition kernels with Tucker rank, and introduces the transfer-ability coefficient that measures the difficulty of representational transfer.

Abstract

Many reinforcement learning (RL) algorithms are too costly to use in practice due to the large sizes of the problem's state and action space. To resolve this issue, we study transfer RL with latent low rank structure. We consider the problem of transferring a latent low rank representation when the source and target MDPs have transition kernels with Tucker rank , , or . In each setting, we introduce the transfer-ability coefficient that measures the difficulty of representational transfer. Our algorithm learns latent representations in each source MDP and then exploits the linear structure to remove the dependence on , or in the target MDP regret bound. We complement our positive results with information theoretic lower bounds that show our algorithms (excluding the () setting) are minimax-optimal with respect to .

Paper Structure

This paper contains 31 sections, 49 theorems, 279 equations, 2 figures, 3 tables, 9 algorithms.

Key Result

Theorem 1

There exist two transfer RL instances such that (i) they satisfy Assumptions asm:tr_s and asm:transfer_s, (ii) they cannot be distinguished without observing $\Omega(\alpha^2)$ samples in the source phase, and (3) they have target action latent features that are orthogonal to each other.

Figures (2)

  • Figure 1: Transition kernel with Tucker rank $(S , S, d )$
  • Figure 2: A hard transfer learning example where one estimates $G_T$ with $G_1$ and $G_2$

Theorems & Definitions (103)

  • Definition 1: Tucker Rank tucker
  • Definition 2: Incoherence
  • Definition 3: Transfer-ability Coefficient
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Lemma 1
  • proof
  • Corollary 1
  • ...and 93 more