Table of Contents
Fetching ...

I Know How: Combining Prior Policies to Solve New Tasks

Malio Li, Elia Piccoli, Vincenzo Lomonaco, Davide Bacciu

TL;DR

This work proposes a new framework, I Know How (IKH), which provides a common formalization of multi-Task Reinforcement Learning and focuses on modularity and compositionality of knowledge in order to achieve and enhance agent’s ability to learn and adapt efficiently to dynamic environments.

Abstract

Multi-Task Reinforcement Learning aims at developing agents that are able to continually evolve and adapt to new scenarios. However, this goal is challenging to achieve due to the phenomenon of catastrophic forgetting and the high demand of computational resources. Learning from scratch for each new task is not a viable or sustainable option, and thus agents should be able to collect and exploit prior knowledge while facing new problems. While several methodologies have attempted to address the problem from different perspectives, they lack a common structure. In this work, we propose a new framework, I Know How (IKH), which provides a common formalization. Our methodology focuses on modularity and compositionality of knowledge in order to achieve and enhance agent's ability to learn and adapt efficiently to dynamic environments. To support our framework definition, we present a simple application of it in a simulated driving environment and compare its performance with that of state-of-the-art approaches.

I Know How: Combining Prior Policies to Solve New Tasks

TL;DR

This work proposes a new framework, I Know How (IKH), which provides a common formalization of multi-Task Reinforcement Learning and focuses on modularity and compositionality of knowledge in order to achieve and enhance agent’s ability to learn and adapt efficiently to dynamic environments.

Abstract

Multi-Task Reinforcement Learning aims at developing agents that are able to continually evolve and adapt to new scenarios. However, this goal is challenging to achieve due to the phenomenon of catastrophic forgetting and the high demand of computational resources. Learning from scratch for each new task is not a viable or sustainable option, and thus agents should be able to collect and exploit prior knowledge while facing new problems. While several methodologies have attempted to address the problem from different perspectives, they lack a common structure. In this work, we propose a new framework, I Know How (IKH), which provides a common formalization. Our methodology focuses on modularity and compositionality of knowledge in order to achieve and enhance agent's ability to learn and adapt efficiently to dynamic environments. To support our framework definition, we present a simple application of it in a simulated driving environment and compare its performance with that of state-of-the-art approaches.
Paper Structure (12 sections, 1 equation, 8 figures, 2 tables, 1 algorithm)

This paper contains 12 sections, 1 equation, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Illustration of the implementation of the IKH framework used in this work. Given a set $\Phi$ of $m$ pre-trained policies on auxiliary tasks, $\pi_{MP}$ predicts the weights $w$ to assign at each action from the policies to define agents' behavior.
  • Figure 2: The Racetrack scenario used to evaluate agents' performance. The track is divided into nine sectors, delimited by the red lines, based on the different characteristics of the circuit.
  • Figure 3: Average number and variance of sections of different agents using varying number of pre-trained policies compared to baseline methods. The acronym reported below refers to the name of the environments (see \ref{['setup']}) that are solved by the policies inside $\Phi$.
  • Figure 4: Each image portrays the agent (yellow car) in different part of the track. Each weight predicted by $\pi_{MP}$ is bounded between $[0,1]$. Mapping the upper bound to white and lower bound to black it possible to appreciate how different pre-trained policies are exploited in different scenarios.
  • Figure 5: Reward graphs of pre-trained policies.
  • ...and 3 more figures