Human-Inspired Framework to Accelerate Reinforcement Learning
Ali Beikmohammadi, Sindri Magnússon
TL;DR
The paper tackles RL sample inefficiency by introducing TA-Explore, a human-inspired curriculum that leverages progressively challenging auxiliary tasks with an annealed assistant reward to accelerate learning of the main objective. It defines a sequence of TA^e MDPs through a convex combination of auxiliary and main rewards and demonstrates that, with a decreasing β(e), the agent transfers knowledge from simpler tasks to the primary task in an algorithm-agnostic way. Empirical results on simple Random Walks and challenging linear/nonlinear control problems show faster convergence and robust performance, with no extra computational cost and the ability to transfer either value or policy across RL methods. Limitations include the need to define suitable auxiliary goals and tune β(e); future work proposes self-tuning β and applying the framework to POMDPs and multi-agent scenarios.
Abstract
Reinforcement learning (RL) is crucial for data science decision-making but suffers from sample inefficiency, particularly in real-world scenarios with costly physical interactions. This paper introduces a novel human-inspired framework to enhance RL algorithm sample efficiency. It achieves this by initially exposing the learning agent to simpler tasks that progressively increase in complexity, ultimately leading to the main task. This method requires no pre-training and involves learning simpler tasks for just one iteration. The resulting knowledge can facilitate various transfer learning approaches, such as value and policy transfer, without increasing computational complexity. It can be applied across different goals, environments, and RL algorithms, including value-based, policy-based, tabular, and deep RL methods. Experimental evaluations demonstrate the framework's effectiveness in enhancing sample efficiency, especially in challenging main tasks, demonstrated through both a simple Random Walk and more complex optimal control problems with constraints.
