Improving Reinforcement Learning Efficiency with Auxiliary Tasks in Non-Visual Environments: A Comparison
Moritz Lange, Noah Krystiniak, Raphael C. Engelhardt, Wolfgang Konen, Laurenz Wiskott
TL;DR
This work investigates decoupled state representation learning for non-visual RL by comparing common auxiliary tasks using OFENet as the decoupled module. Across five environments, including a challenging FetchSlideDense-v1 task, the study finds that learning representations with auxiliary tasks can boost sample efficiency and maximum returns in complex settings, while offering little benefit for simple problems. Dynamics-focused tasks (forward state prediction and forward state difference) generally outperform reward prediction, and decoupled representations can render difficult tasks solvable for TD3, with results showing algorithm-dependent variability. The findings support the development of interpretable, modular representation learning approaches to improve real-world RL applicability.
Abstract
Real-world reinforcement learning (RL) environments, whether in robotics or industrial settings, often involve non-visual observations and require not only efficient but also reliable and thus interpretable and flexible RL approaches. To improve efficiency, agents that perform state representation learning with auxiliary tasks have been widely studied in visual observation contexts. However, for real-world problems, dedicated representation learning modules that are decoupled from RL agents are more suited to meet requirements. This study compares common auxiliary tasks based on, to the best of our knowledge, the only decoupled representation learning method for low-dimensional non-visual observations. We evaluate potential improvements in sample efficiency and returns for environments ranging from a simple pendulum to a complex simulated robotics task. Our findings show that representation learning with auxiliary tasks only provides performance gains in sufficiently complex environments and that learning environment dynamics is preferable to predicting rewards. These insights can inform future development of interpretable representation learning approaches for non-visual observations and advance the use of RL solutions in real-world scenarios.
