Table of Contents
Fetching ...

Towards Understanding the Benefit of Multitask Representation Learning in Decision Process

Rui Lu, Yang Yue, Andrew Zhao, Simon Du, Gao Huang

TL;DR

This work develops a theoretical and empirical framework for Multitask Representation Learning (MRL) in reinforcement learning with unknown, non-linear representations. It introduces Generalized Functional Upper Confidence Bound (GFUCB), a principled algorithm that operates over a shared non-linear representation φ and multiple task heads, and proves regret bounds that scale favorably with the number of tasks. The analysis leverages the Eluder dimension to quantify complexity and shows mechanisms by which joint training accelerates learning and enables transfer to related tasks. Empirical studies with neural-network-based backbones in bandit and MDP settings corroborate the theory, demonstrating improved sample efficiency and effective transfer when leveraging shared representations across tasks.

Abstract

Multitask Representation Learning (MRL) has emerged as a prevalent technique to improve sample efficiency in Reinforcement Learning (RL). Empirical studies have found that training agents on multiple tasks simultaneously within online and transfer learning environments can greatly improve efficiency. Despite its popularity, a comprehensive theoretical framework that elucidates its operational efficacy remains incomplete. Prior analyses have predominantly assumed that agents either possess a pre-known representation function or utilize functions from a linear class, where both are impractical. The complexity of real-world applications typically requires the use of sophisticated, non-linear functions such as neural networks as representation function, which are not pre-existing but must be learned. Our work tries to fill the gap by extending the analysis to \textit{unknown non-linear} representations, giving a comprehensive analysis for its mechanism in online and transfer learning setting. We consider the setting that an agent simultaneously playing $M$ contextual bandits (or MDPs), developing a shared representation function $φ$ from a non-linear function class $Φ$ using our novel Generalized Functional Upper Confidence Bound algorithm (GFUCB). We formally prove that this approach yields a regret upper bound that outperforms the lower bound associated with learning $M$ separate tasks, marking the first demonstration of MRL's efficacy in a general function class. This framework also explains the contribution of representations to transfer learning when faced with new, yet related tasks, and identifies key conditions for successful transfer. Empirical experiments further corroborate our theoretical findings.

Towards Understanding the Benefit of Multitask Representation Learning in Decision Process

TL;DR

This work develops a theoretical and empirical framework for Multitask Representation Learning (MRL) in reinforcement learning with unknown, non-linear representations. It introduces Generalized Functional Upper Confidence Bound (GFUCB), a principled algorithm that operates over a shared non-linear representation φ and multiple task heads, and proves regret bounds that scale favorably with the number of tasks. The analysis leverages the Eluder dimension to quantify complexity and shows mechanisms by which joint training accelerates learning and enables transfer to related tasks. Empirical studies with neural-network-based backbones in bandit and MDP settings corroborate the theory, demonstrating improved sample efficiency and effective transfer when leveraging shared representations across tasks.

Abstract

Multitask Representation Learning (MRL) has emerged as a prevalent technique to improve sample efficiency in Reinforcement Learning (RL). Empirical studies have found that training agents on multiple tasks simultaneously within online and transfer learning environments can greatly improve efficiency. Despite its popularity, a comprehensive theoretical framework that elucidates its operational efficacy remains incomplete. Prior analyses have predominantly assumed that agents either possess a pre-known representation function or utilize functions from a linear class, where both are impractical. The complexity of real-world applications typically requires the use of sophisticated, non-linear functions such as neural networks as representation function, which are not pre-existing but must be learned. Our work tries to fill the gap by extending the analysis to \textit{unknown non-linear} representations, giving a comprehensive analysis for its mechanism in online and transfer learning setting. We consider the setting that an agent simultaneously playing contextual bandits (or MDPs), developing a shared representation function from a non-linear function class using our novel Generalized Functional Upper Confidence Bound algorithm (GFUCB). We formally prove that this approach yields a regret upper bound that outperforms the lower bound associated with learning separate tasks, marking the first demonstration of MRL's efficacy in a general function class. This framework also explains the contribution of representations to transfer learning when faced with new, yet related tasks, and identifies key conditions for successful transfer. Empirical experiments further corroborate our theoretical findings.

Paper Structure

This paper contains 51 sections, 109 equations, 6 figures, 3 algorithms.

Figures (6)

  • Figure 1: The illustration of the mechanism behind multitask representation learning. Training separately on each task essentially learns in space $\mathcal{F}^M=(\mathcal{L} \circ \Phi)^M$, which requires samples to independently locate $M$ different representation backbone $\phi_i$s, while MRL like GFUCB learns in space $\mathcal{F}^{\otimes M} = \mathcal{L}^M\circ \Phi$ that is much more compact by sharing the same backbone. Therefore, MRL requires much less sample to learn representation $\phi$ and its total regret enjoys a sublinear dependency on task number $M$.
  • Figure 2: Bandit experiments. (a) Multitask Learning. Cumulative regret over steps for $M=1,5,10$. (b) Transfer Learning. The representation is pretrained on $M=5,10$ tasks with varying training steps. Then we run LinUCB on new $M'=3$ tasks respectively and the average regret is reported.
  • Figure 3: The 4x4 grid MDP task.
  • Figure 4: MDP experiments. (a) Multitask Learning. Cumulative regret over steps for $M=1,5,10$. (b) Transfer Learning. The representation is pretrained on $M=1,5,10$ tasks. Then average regret on new $M'$ tasks is reported.
  • Figure 5: (a) The relationship between unknown data's prediction error and the bonus it gets from finetuning. The grey line is $y=x$. (b) The average bonus level of 100 test images with respect to the number of samples in training set, the shaded area is the interval for $\pm 1$ standard deviation.
  • ...and 1 more figures