Table of Contents
Fetching ...

Knowledge Transfer in Deep Reinforcement Learning via an RL-Specific GAN-Based Correspondence Function

Marko Ruman, Tatiana V. Guy

TL;DR

A novel approach is introduced that modifies Cycle Generative Adversarial Networks specifically for reinforcement learning, enabling effective one-to-one knowledge transfer between two tasks, and enhances the loss function with two new components: model loss and Q-loss.

Abstract

Deep reinforcement learning has demonstrated superhuman performance in complex decision-making tasks, but it struggles with generalization and knowledge reuse - key aspects of true intelligence. This article introduces a novel approach that modifies Cycle Generative Adversarial Networks specifically for reinforcement learning, enabling effective one-to-one knowledge transfer between two tasks. Our method enhances the loss function with two new components: model loss, which captures dynamic relationships between source and target tasks, and Q-loss, which identifies states significantly influencing the target decision policy. Tested on the 2-D Atari game Pong, our method achieved 100% knowledge transfer in identical tasks and either 100% knowledge transfer or a 30% reduction in training time for a rotated task, depending on the network architecture. In contrast, using standard Generative Adversarial Networks or Cycle Generative Adversarial Networks led to worse performance than training from scratch in the majority of cases. The results demonstrate that the proposed method ensured enhanced knowledge generalization in deep reinforcement learning.

Knowledge Transfer in Deep Reinforcement Learning via an RL-Specific GAN-Based Correspondence Function

TL;DR

A novel approach is introduced that modifies Cycle Generative Adversarial Networks specifically for reinforcement learning, enabling effective one-to-one knowledge transfer between two tasks, and enhances the loss function with two new components: model loss and Q-loss.

Abstract

Deep reinforcement learning has demonstrated superhuman performance in complex decision-making tasks, but it struggles with generalization and knowledge reuse - key aspects of true intelligence. This article introduces a novel approach that modifies Cycle Generative Adversarial Networks specifically for reinforcement learning, enabling effective one-to-one knowledge transfer between two tasks. Our method enhances the loss function with two new components: model loss, which captures dynamic relationships between source and target tasks, and Q-loss, which identifies states significantly influencing the target decision policy. Tested on the 2-D Atari game Pong, our method achieved 100% knowledge transfer in identical tasks and either 100% knowledge transfer or a 30% reduction in training time for a rotated task, depending on the network architecture. In contrast, using standard Generative Adversarial Networks or Cycle Generative Adversarial Networks led to worse performance than training from scratch in the majority of cases. The results demonstrate that the proposed method ensured enhanced knowledge generalization in deep reinforcement learning.
Paper Structure (20 sections, 13 equations, 9 figures, 1 algorithm)

This paper contains 20 sections, 13 equations, 9 figures, 1 algorithm.

Figures (9)

  • Figure 1: The proposed TL between tasks $S$ and $T$.
  • Figure 2: Standard Pong, atari
  • Figure 3: Pong rotated by 90 degrees, atari
  • Figure 4: Experiment 1: Average accumulated reward per game when playing five games with the transformed $Q$-function (\ref{['e: transformed Q']}). The agent paused the correspondence function learning each $1000$ learning steps and played five games where the average reward gained per game is displayed. The performance is shown for different values of loss parameters $\lambda_{Cyc}$, $\lambda_Q$ and $\lambda_M$. Figure \ref{['f:exp1 average reward']}a and \ref{['f:exp1 average reward']}b show the baselines using GAN and CycleGAN methods.
  • Figure 5: Experiment 1: Screenshots of the game depicting the progress of learning correspondence function $\mathcal{C}$, \ref{['e: specific correspondence function']}, after $0, 30000, 60000$ and $80000$ steps. The results are shown for different values of parameters $\lambda_{Cyc}$, $\lambda_Q$ and $\lambda_M$ (\ref{['e:final loss']}). The left parts are game frames of the target task serving as states, and the right parts are the same states mapped by the learned correspondence function, $\mathcal{C}$.
  • ...and 4 more figures

Theorems & Definitions (1)

  • Definition 3.1: Correspondence function