Table of Contents
Fetching ...

Sharing Knowledge in Multi-Task Deep Reinforcement Learning

Carlo D'Eramo, Davide Tateo, Andrea Bonarini, Marcello Restelli, Jan Peters

TL;DR

This paper addresses learning multiple tasks simultaneously in deep reinforcement learning by introducing and validating shared representations across tasks. It provides theoretical guarantees—extending finite-time Approximate Value Iteration bounds and approximation-error analysis—to the multi-task setting, showing that shared representations can reduce error accumulation as the number of tasks grows. A practical neural-network architecture with per-task adapters, a shared feature extractor, and task-specific heads is proposed and instantiated in multi-task variants of FQI, DQN, and DDPG. Empirically, the approach yields improved sample efficiency and performance on MuJoCo and classic control benchmarks, with transfer benefits observed from pre-trained shared representations. Overall, the work establishes both theoretical and empirical justification for joint multi-task representation learning in deep RL.

Abstract

We study the benefit of sharing representations among tasks to enable the effective use of deep neural networks in Multi-Task Reinforcement Learning. We leverage the assumption that learning from different tasks, sharing common properties, is helpful to generalize the knowledge of them resulting in a more effective feature extraction compared to learning a single task. Intuitively, the resulting set of features offers performance benefits when used by Reinforcement Learning algorithms. We prove this by providing theoretical guarantees that highlight the conditions for which is convenient to share representations among tasks, extending the well-known finite-time bounds of Approximate Value-Iteration to the multi-task setting. In addition, we complement our analysis by proposing multi-task extensions of three Reinforcement Learning algorithms that we empirically evaluate on widely used Reinforcement Learning benchmarks showing significant improvements over the single-task counterparts in terms of sample efficiency and performance.

Sharing Knowledge in Multi-Task Deep Reinforcement Learning

TL;DR

This paper addresses learning multiple tasks simultaneously in deep reinforcement learning by introducing and validating shared representations across tasks. It provides theoretical guarantees—extending finite-time Approximate Value Iteration bounds and approximation-error analysis—to the multi-task setting, showing that shared representations can reduce error accumulation as the number of tasks grows. A practical neural-network architecture with per-task adapters, a shared feature extractor, and task-specific heads is proposed and instantiated in multi-task variants of FQI, DQN, and DDPG. Empirically, the approach yields improved sample efficiency and performance on MuJoCo and classic control benchmarks, with transfer benefits observed from pre-trained shared representations. Overall, the work establishes both theoretical and empirical justification for joint multi-task representation learning in deep RL.

Abstract

We study the benefit of sharing representations among tasks to enable the effective use of deep neural networks in Multi-Task Reinforcement Learning. We leverage the assumption that learning from different tasks, sharing common properties, is helpful to generalize the knowledge of them resulting in a more effective feature extraction compared to learning a single task. Intuitively, the resulting set of features offers performance benefits when used by Reinforcement Learning algorithms. We prove this by providing theoretical guarantees that highlight the conditions for which is convenient to share representations among tasks, extending the well-known finite-time bounds of Approximate Value-Iteration to the multi-task setting. In addition, we complement our analysis by proposing multi-task extensions of three Reinforcement Learning algorithms that we empirically evaluate on widely used Reinforcement Learning benchmarks showing significant improvements over the single-task counterparts in terms of sample efficiency and performance.
Paper Structure (25 sections, 6 theorems, 36 equations, 3 figures, 1 table)

This paper contains 25 sections, 6 theorems, 36 equations, 3 figures, 1 table.

Key Result

Theorem 1

(Theorem 3.4 of farahmand2011regularization) Let K be a positive integer, and $Q_{\text{max}} \leq \frac{R_{\text{max}}}{1-\gamma}$. Then for any sequence $(Q_k )^K_{k=0}\subset B(\mathcal{S}\times\mathcal{A}, Q_{\text{max}})$ and the corresponding sequence $(\varepsilon_k)_{k=0}^{K-1}$, where $\var where with $\mathcal{E}(\varepsilon_{0}, \dots, \varepsilon_{K-1};r)=\sum^{K-1}_{k=0}\alpha_k^{2r}

Figures (3)

  • Figure 1: (a) The architecture of the neural network we propose to learn $T$ tasks simultaneously. The $w_t$ block maps each input $x_t$ from task $\mu_t$ to a shared set of layers $h$ which extracts a common representation of the tasks. Eventually, the shared representation is specialized in block $f_t$ and the output $y_t$ of the network is computed. Note that each block can be composed of arbitrarily many layers. (b) Results of and averaged over $4$ tasks in Car-On-Hill, showing $\lVert Q^* - Q^{\pi_K}\rVert$ on the left, and the discounted cumulative reward on the right. (c) Results of showing $\lVert Q^* - Q^{\pi_K}\rVert$ for increasing number of tasks. Both results in (b) and (c) are averaged over $100$ experiments, and show the $95\%$ confidence intervals.
  • Figure 2: Discounted cumulative reward averaged over $100$ experiments of and for each task and for transfer learning in the Acrobot problem. An epoch consists of $1,000$ steps, after which the greedy policy is evaluated for $2,000$ steps. The $95\%$ confidence intervals are shown.
  • Figure 3: Discounted cumulative reward averaged over $40$ experiments of and for each task and for transfer learning in the Inverted-Double-Pendulum and Hopper problems. An epoch consists of $10,000$ steps, after which the greedy policy is evaluated for $5,000$ steps. The $95\%$ confidence intervals are shown.

Theorems & Definitions (10)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 4
  • proof : Proof of Theorem \ref{['T:avi_bound']}
  • proof : Proof of Lemma \ref{['T:eps_star']}
  • Theorem 5
  • Theorem 6
  • proof : Proof of Theorem \ref{['T:api_bound']}
  • proof : Proof of Theorem \ref{['T:apprx']}