Table of Contents
Fetching ...

Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain

Janarthanan Rajendran, Aravind Srinivas, Mitesh M. Khapra, P Prasanna, Balaraman Ravindran

TL;DR

A2T addresses the central challenge of transferring knowledge in reinforcement learning without incurring negative transfer, and it enables selective transfer from multiple source tasks within the same domain. It introduces a soft-attention mechanism that combines fixed source solutions and a learnable base network on a per-state basis, allowing the agent to attend to different experts as needed. The approach is instantiated for both policy and value transfer, using REINFORCE, Actor-Critic, and DQN-based methods with stable training via target networks and experience replay. Empirical results across simple and complex tasks demonstrate improved learning speed and final performance, along with clear demonstrations of selective transfer and robust avoidance of negative transfer. This framework offers a flexible, modular path toward continual and meta-learning in RL by treating source-task solutions as differentiable components that can be orchestrated by a neural attention mechanism.

Abstract

Transferring knowledge from prior source tasks in solving a new target task can be useful in several learning applications. The application of transfer poses two serious challenges which have not been adequately addressed. First, the agent should be able to avoid negative transfer, which happens when the transfer hampers or slows down the learning instead of helping it. Second, the agent should be able to selectively transfer, which is the ability to select and transfer from different and multiple source tasks for different parts of the state space of the target task. We propose A2T (Attend, Adapt and Transfer), an attentive deep architecture which adapts and transfers from these source tasks. Our model is generic enough to effect transfer of either policies or value functions. Empirical evaluations on different learning algorithms show that A2T is an effective architecture for transfer by being able to avoid negative transfer while transferring selectively from multiple source tasks in the same domain.

Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain

TL;DR

A2T addresses the central challenge of transferring knowledge in reinforcement learning without incurring negative transfer, and it enables selective transfer from multiple source tasks within the same domain. It introduces a soft-attention mechanism that combines fixed source solutions and a learnable base network on a per-state basis, allowing the agent to attend to different experts as needed. The approach is instantiated for both policy and value transfer, using REINFORCE, Actor-Critic, and DQN-based methods with stable training via target networks and experience replay. Empirical results across simple and complex tasks demonstrate improved learning speed and final performance, along with clear demonstrations of selective transfer and robust avoidance of negative transfer. This framework offers a flexible, modular path toward continual and meta-learning in RL by treating source-task solutions as differentiable components that can be orchestrated by a neural attention mechanism.

Abstract

Transferring knowledge from prior source tasks in solving a new target task can be useful in several learning applications. The application of transfer poses two serious challenges which have not been adequately addressed. First, the agent should be able to avoid negative transfer, which happens when the transfer hampers or slows down the learning instead of helping it. Second, the agent should be able to selectively transfer, which is the ability to select and transfer from different and multiple source tasks for different parts of the state space of the target task. We propose A2T (Attend, Adapt and Transfer), an attentive deep architecture which adapts and transfers from these source tasks. Our model is generic enough to effect transfer of either policies or value functions. Empirical evaluations on different learning algorithms show that A2T is an effective architecture for transfer by being able to avoid negative transfer while transferring selectively from multiple source tasks in the same domain.

Paper Structure

This paper contains 14 sections, 18 equations, 13 figures.

Figures (13)

  • Figure 1: (a) A2T architecture. The doted arrows represent the path of back propagation. (b) Actor-Critic using A2T.
  • Figure 2: Different worlds for policy transfer experiments
  • Figure 3: Results of the selective policy transfer experiments
  • Figure 4: Visualisation of the attention weights in the Selective Transfer with Attention Network experiment: Green and Blue bars signify the attention probabilities for Expert-1 ($L1$) and Expert-2 ($L2$) respectively. We see that in the first two snapshots, the ball is in the lower quadrant and as expected, the attention is high on Expert-1, while in the third and fourth snapshots, as the ball bounces back into the upper quadrant, the attention increases on Expert-2.
  • Figure 5: Selective Value Transfer.
  • ...and 8 more figures