Table of Contents
Fetching ...

Exploration in Knowledge Transfer Utilizing Reinforcement Learning

Adam Jedlička, Tatiana Valentine Guy

TL;DR

Knowledge transfer in reinforcement learning benefits from efficient exploration to reuse information across similar tasks. The paper introduces Deep TTQL, a deep-learning extension of Target Transfer Q-learning, and evaluates exploration strategies—ε-greedy, Boltzmann, and UCB—on a high-dimensional virtual drone navigation task. Key contributions include a practical MNBE-based transfer decision rule and the use of a Source Q-network alongside a Main/Target Q-network to enable scalable transfer. The findings show TTQL can outperform non-transfer baselines and that exploration strategy choice influences early learning dynamics, providing guidance for TL in complex control tasks.

Abstract

The contribution focuses on the problem of exploration within the task of knowledge transfer. Knowledge transfer refers to the useful application of the knowledge gained while learning the source task in the target task. The intended benefit of knowledge transfer is to speed up the learning process of the target task. The article aims to compare several exploration methods used within a deep transfer learning algorithm, particularly Deep Target Transfer $Q$-learning. The methods used are $ε$-greedy, Boltzmann, and upper confidence bound exploration. The aforementioned transfer learning algorithms and exploration methods were tested on the virtual drone problem. The results have shown that the upper confidence bound algorithm performs the best out of these options. Its sustainability to other applications is to be checked.

Exploration in Knowledge Transfer Utilizing Reinforcement Learning

TL;DR

Knowledge transfer in reinforcement learning benefits from efficient exploration to reuse information across similar tasks. The paper introduces Deep TTQL, a deep-learning extension of Target Transfer Q-learning, and evaluates exploration strategies—ε-greedy, Boltzmann, and UCB—on a high-dimensional virtual drone navigation task. Key contributions include a practical MNBE-based transfer decision rule and the use of a Source Q-network alongside a Main/Target Q-network to enable scalable transfer. The findings show TTQL can outperform non-transfer baselines and that exploration strategy choice influences early learning dynamics, providing guidance for TL in complex control tasks.

Abstract

The contribution focuses on the problem of exploration within the task of knowledge transfer. Knowledge transfer refers to the useful application of the knowledge gained while learning the source task in the target task. The intended benefit of knowledge transfer is to speed up the learning process of the target task. The article aims to compare several exploration methods used within a deep transfer learning algorithm, particularly Deep Target Transfer -learning. The methods used are -greedy, Boltzmann, and upper confidence bound exploration. The aforementioned transfer learning algorithms and exploration methods were tested on the virtual drone problem. The results have shown that the upper confidence bound algorithm performs the best out of these options. Its sustainability to other applications is to be checked.
Paper Structure (12 sections, 10 equations, 7 figures, 3 tables)

This paper contains 12 sections, 10 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Interaction of the agent with the environment in MDP
  • Figure 2: Image from RGB camera segmented by a grid into cells corresponding to possible actions
  • Figure 3: Example of an image from depth of field camera
  • Figure 4: Results of Experiment 1 - Reward accumulated over episodes with different numbers of iterations
  • Figure 5: The evolution of transfer instances over time
  • ...and 2 more figures

Theorems & Definitions (2)

  • Definition 1: $Q$-function
  • Definition 2