Table of Contents
Fetching ...

Auxiliary task discovery through generate-and-test

Banafsheh Rafiee, Sina Ghiassian, Jun Jin, Richard Sutton, Jun Luo, Adam White

TL;DR

This paper addresses the challenge of autonomously discovering auxiliary tasks to improve reinforcement learning data efficiency. It introduces a generate-and-test framework where a generator proposes new auxiliary tasks and a tester evaluates them by measuring how much the induced features contribute to the main task, using a Master-User learning strategy to attribute feature changes to specific tasks. A new usefulness measure for auxiliary tasks is defined via feature-level contributions, and the approach is augmented with a replacement mechanism to prune ineffective tasks. Experimental results across gridworlds and a pinball domain show the method outperforms learning with no auxiliary tasks and fixed random tasks, while a feature-attainment variant offers improved scalability. The work provides a practical, tunable pathway toward automatic auxiliary task discovery and grounds future integration with meta-learning and larger-scale domains.

Abstract

In this paper, we explore an approach to auxiliary task discovery in reinforcement learning based on ideas from representation learning. Auxiliary tasks tend to improve data efficiency by forcing the agent to learn auxiliary prediction and control objectives in addition to the main task of maximizing reward, and thus producing better representations. Typically these tasks are designed by people. Meta-learning offers a promising avenue for automatic task discovery; however, these methods are computationally expensive and challenging to tune in practice. In this paper, we explore a complementary approach to the auxiliary task discovery: continually generating new auxiliary tasks and preserving only those with high utility. We also introduce a new measure of auxiliary tasks' usefulness based on how useful the features induced by them are for the main task. Our discovery algorithm significantly outperforms random tasks and learning without auxiliary tasks across a suite of environments.

Auxiliary task discovery through generate-and-test

TL;DR

This paper addresses the challenge of autonomously discovering auxiliary tasks to improve reinforcement learning data efficiency. It introduces a generate-and-test framework where a generator proposes new auxiliary tasks and a tester evaluates them by measuring how much the induced features contribute to the main task, using a Master-User learning strategy to attribute feature changes to specific tasks. A new usefulness measure for auxiliary tasks is defined via feature-level contributions, and the approach is augmented with a replacement mechanism to prune ineffective tasks. Experimental results across gridworlds and a pinball domain show the method outperforms learning with no auxiliary tasks and fixed random tasks, while a feature-attainment variant offers improved scalability. The work provides a practical, tunable pathway toward automatic auxiliary task discovery and grounds future integration with meta-learning and larger-scale domains.

Abstract

In this paper, we explore an approach to auxiliary task discovery in reinforcement learning based on ideas from representation learning. Auxiliary tasks tend to improve data efficiency by forcing the agent to learn auxiliary prediction and control objectives in addition to the main task of maximizing reward, and thus producing better representations. Typically these tasks are designed by people. Meta-learning offers a promising avenue for automatic task discovery; however, these methods are computationally expensive and challenging to tune in practice. In this paper, we explore a complementary approach to the auxiliary task discovery: continually generating new auxiliary tasks and preserving only those with high utility. We also introduce a new measure of auxiliary tasks' usefulness based on how useful the features induced by them are for the main task. Our discovery algorithm significantly outperforms random tasks and learning without auxiliary tasks across a suite of environments.
Paper Structure (11 sections, 4 equations, 7 figures, 1 algorithm)

This paper contains 11 sections, 4 equations, 7 figures, 1 algorithm.

Figures (7)

  • Figure 1: The forward pass, backward pass for the main task, and backward pass for auxiliary task $1$ when using the Master-User strategy for learning auxiliary tasks alongside the main task. All features are used by all tasks in the forward pass but only modified through the gradient backpropagated from one task. The dotted arrows show stop-gradient connections. The gradients does not go back any further from these connections. When using the Master-User strategy, it is clear which auxiliary task was responsible for inducing which feature.
  • Figure 2: Left: The four-rooms environment with the subgoals corresponding to the good and bad hand-designed auxiliary tasks shown in red and blue respectively. Middle: Hallway auxiliary tasks improved the performance in terms of learning speed. The corner auxiliary tasks made learning slower in the early episodes. Right: The proposed tester evaluated the hand-designed auxiliary tasks well, giving higher utility to the hallway auxiliary tasks. The results are averaged over 30 runs and the shaded regions depict the standard error.
  • Figure 3: The learning curves for the proposed generate-and-test method (green), the baseline with no auxiliary tasks (orange), the baseline with fixed random auxiliary tasks (black). The results are averaged over $30$ runs and the shaded regions depict the standard error. The proposed generate-and-test method improved over the baseline with no auxiliary tasks. Generate-and-test also outperformed the baseline with fixed random auxiliary tasks. Fixed random auxiliary tasks also resulted in performance gain over the baseline.
  • Figure 4: Example discovered auxiliary tasks in the three environments. Generate-and-test discovered reasonably good auxiliary tasks: In the gridworld environments, the subgoals corresponding to the discovered auxiliary tasks were close to the goal states. In the pinball environment, the discovered auxiliary tasks were more concentrated in the central areas.
  • Figure 5: The learning curves for the proposed generate-and-test method with the feature-attainment generator (lime green) and the baseline with no auxiliary tasks (orange). The results are averaged over $30$ runs and the shaded regions depict the standard error. The proposed generate-and-test method with the feature-attainment generator improved over the baseline with no auxiliary tasks and is potentially more salable than random auxiliary task generation.
  • ...and 2 more figures