Improving Reinforcement Learning Efficiency with Auxiliary Tasks in Non-Visual Environments: A Comparison

Moritz Lange; Noah Krystiniak; Raphael C. Engelhardt; Wolfgang Konen; Laurenz Wiskott

Improving Reinforcement Learning Efficiency with Auxiliary Tasks in Non-Visual Environments: A Comparison

Moritz Lange, Noah Krystiniak, Raphael C. Engelhardt, Wolfgang Konen, Laurenz Wiskott

TL;DR

This work investigates decoupled state representation learning for non-visual RL by comparing common auxiliary tasks using OFENet as the decoupled module. Across five environments, including a challenging FetchSlideDense-v1 task, the study finds that learning representations with auxiliary tasks can boost sample efficiency and maximum returns in complex settings, while offering little benefit for simple problems. Dynamics-focused tasks (forward state prediction and forward state difference) generally outperform reward prediction, and decoupled representations can render difficult tasks solvable for TD3, with results showing algorithm-dependent variability. The findings support the development of interpretable, modular representation learning approaches to improve real-world RL applicability.

Abstract

Real-world reinforcement learning (RL) environments, whether in robotics or industrial settings, often involve non-visual observations and require not only efficient but also reliable and thus interpretable and flexible RL approaches. To improve efficiency, agents that perform state representation learning with auxiliary tasks have been widely studied in visual observation contexts. However, for real-world problems, dedicated representation learning modules that are decoupled from RL agents are more suited to meet requirements. This study compares common auxiliary tasks based on, to the best of our knowledge, the only decoupled representation learning method for low-dimensional non-visual observations. We evaluate potential improvements in sample efficiency and returns for environments ranging from a simple pendulum to a complex simulated robotics task. Our findings show that representation learning with auxiliary tasks only provides performance gains in sufficiently complex environments and that learning environment dynamics is preferable to predicting rewards. These insights can inform future development of interpretable representation learning approaches for non-visual observations and advance the use of RL solutions in real-world scenarios.

Improving Reinforcement Learning Efficiency with Auxiliary Tasks in Non-Visual Environments: A Comparison

TL;DR

Abstract

Paper Structure (12 sections, 7 figures, 1 table)

This paper contains 12 sections, 7 figures, 1 table.

Introduction
Related Work
Auxiliary Tasks
Methods
Representation Learning Network
Reinforcement Learning Algorithms
Environments
Experiments
Results
Representation Learning for Different Types of Environments
Comparison of Auxiliary Tasks
Conclusion

Figures (7)

Figure 1: An overview of inputs and prediction targets of common auxiliary tasks.
Figure 2: Diagram of information flow in an actor-critic setup with the inv auxiliary task where the critic, or representation learning module, receives $o_t$ and $o_{t+1}$ (and potentially $a_t$) as input. If $o_{t+1}$ is part of the input to the critic, directly or through the module, the gradient of the critic loss cannot be propagated back to the actor as long as the environment (red) is not differentiable. Even if the action is additionally passed into the critic directly (dashed grey line) the actor will not get the true gradient.
Figure 3: Sketch of the OFENet architecture, modified from ota_can_2020. Observation $o_t$ and action $a_t$ are used to calculate representations $z_{o_t}$ and $z_{o_t, a_t}$. These are passed into the RL algorithm (light grey). The prediction target necessary to evaluate the auxiliary loss, e.g. $o_{t+1}$, is calculated with a fully connected layer (FC, light grey) from $z_{o_t, a_t}$.
Figure 4: Sample images rendered to visualize the environments. The image of FetchSlideDense-v1 is taken from plappert_multi-goal_2018.
Figure 5: Returns/success rates achieved with TD3 and different auxiliary tasks on various environments. The shaded areas show minimum and maximum performance achieved across 5 runs, while the lines represent the means. Values have been smoothed slightly for better visualisation.
...and 2 more figures

Improving Reinforcement Learning Efficiency with Auxiliary Tasks in Non-Visual Environments: A Comparison

TL;DR

Abstract

Improving Reinforcement Learning Efficiency with Auxiliary Tasks in Non-Visual Environments: A Comparison

Authors

TL;DR

Abstract

Table of Contents

Figures (7)