Table of Contents
Fetching ...

Learning to Multi-Task by Active Sampling

Sahil Sharma, Ashutosh Jha, Parikshit Hegde, Balaraman Ravindran

TL;DR

The paper tackles online multi-task reinforcement learning by introducing active-sampling strategies that decide which task to train on next, eliminating the need for expert task-specific networks. It develops three instantiations (A5C, UA4C, EA4C) atop an A3C-based multitasking network, and evaluates them on seven Atari MTIs, including a 21-task instance. The results show substantial gains over a baseline uniform-sampling MTL approach, with EA4C delivering the strongest generalization on large MTIs and A5C performing best on simpler ones; extensive analyses reveal that learned representations become more task-agnostic. The work highlights the potential for task-aware curricula in online MTL and points to future direction in regularizing representations to be more universally applicable across tasks.

Abstract

One of the long-standing challenges in Artificial Intelligence for learning goal-directed behavior is to build a single agent which can solve multiple tasks. Recent progress in multi-task learning for goal-directed sequential problems has been in the form of distillation based learning wherein a student network learns from multiple task-specific expert networks by mimicking the task-specific policies of the expert networks. While such approaches offer a promising solution to the multi-task learning problem, they require supervision from large expert networks which require extensive data and computation time for training. In this work, we propose an efficient multi-task learning framework which solves multiple goal-directed tasks in an on-line setup without the need for expert supervision. Our work uses active learning principles to achieve multi-task learning by sampling the harder tasks more than the easier ones. We propose three distinct models under our active sampling framework. An adaptive method with extremely competitive multi-tasking performance. A UCB-based meta-learner which casts the problem of picking the next task to train on as a multi-armed bandit problem. A meta-learning method that casts the next-task picking problem as a full Reinforcement Learning problem and uses actor critic methods for optimizing the multi-tasking performance directly. We demonstrate results in the Atari 2600 domain on seven multi-tasking instances: three 6-task instances, one 8-task instance, two 12-task instances and one 21-task instance.

Learning to Multi-Task by Active Sampling

TL;DR

The paper tackles online multi-task reinforcement learning by introducing active-sampling strategies that decide which task to train on next, eliminating the need for expert task-specific networks. It develops three instantiations (A5C, UA4C, EA4C) atop an A3C-based multitasking network, and evaluates them on seven Atari MTIs, including a 21-task instance. The results show substantial gains over a baseline uniform-sampling MTL approach, with EA4C delivering the strongest generalization on large MTIs and A5C performing best on simpler ones; extensive analyses reveal that learned representations become more task-agnostic. The work highlights the potential for task-aware curricula in online MTL and points to future direction in regularizing representations to be more universally applicable across tasks.

Abstract

One of the long-standing challenges in Artificial Intelligence for learning goal-directed behavior is to build a single agent which can solve multiple tasks. Recent progress in multi-task learning for goal-directed sequential problems has been in the form of distillation based learning wherein a student network learns from multiple task-specific expert networks by mimicking the task-specific policies of the expert networks. While such approaches offer a promising solution to the multi-task learning problem, they require supervision from large expert networks which require extensive data and computation time for training. In this work, we propose an efficient multi-task learning framework which solves multiple goal-directed tasks in an on-line setup without the need for expert supervision. Our work uses active learning principles to achieve multi-task learning by sampling the harder tasks more than the easier ones. We propose three distinct models under our active sampling framework. An adaptive method with extremely competitive multi-tasking performance. A UCB-based meta-learner which casts the problem of picking the next task to train on as a multi-armed bandit problem. A meta-learning method that casts the next-task picking problem as a full Reinforcement Learning problem and uses actor critic methods for optimizing the multi-tasking performance directly. We demonstrate results in the Atari 2600 domain on seven multi-tasking instances: three 6-task instances, one 8-task instance, two 12-task instances and one 21-task instance.

Paper Structure

This paper contains 18 sections, 6 equations, 25 figures, 11 tables, 7 algorithms.

Figures (25)

  • Figure 1: Multi Tasking Instance MT7 ($21$ tasks). Anonymous playlist of game-play for MT7 is at: https://goo.gl/GBXfWD. It verifies our claim; tasks are visually different, difficult & unrelated.
  • Figure 2: A visualization of our Active-sampling based Multi-Task Learning Framework
  • Figure 3: Evolution of game-play performance scores for A5C,EA4C,UA4C and BA3C agents. All training curves for all the multi-tasking instances have been presented in Appendix $D$.
  • Figure 4: Understanding abstract LSTM features in our proposed methods by analyzing firing patterns
  • Figure 5: Turn Off analysis heap-maps for the all agents. For BA3C since the agent scored 0 on one of the games, normalization along the neuron was done only across the other 5 games.
  • ...and 20 more figures