Leveraging Human Guidance for Deep Reinforcement Learning Tasks
Ruohan Zhang, Faraz Torabi, Lin Guan, Dana H. Ballard, Peter Stone
TL;DR
The paper surveys human-guided deep reinforcement learning approaches that go beyond traditional demonstrations, focusing on evaluative feedback, human preferences, hierarchical guidance, imitation from observation, and attention-based signals. It analyzes how each framework defines signals, assumptions, and implementations, illustrating improvements in sample efficiency and performance on challenging tasks. Key contributions include mapping diverse feedback modalities to learning objectives, highlighting practical methods like TAMER, COACH, preference-based RL, and IfO, and outlining future directions such as data sharing, understanding trainers, and a unified lifelong learning paradigm. The work emphasizes combining multiple human guidance signals to create more robust, scalable learning systems for complex environments.
Abstract
Reinforcement learning agents can learn to solve sequential decision tasks by interacting with the environment. Human knowledge of how to solve these tasks can be incorporated using imitation learning, where the agent learns to imitate human demonstrated decisions. However, human guidance is not limited to the demonstrations. Other types of guidance could be more suitable for certain tasks and require less human effort. This survey provides a high-level overview of five recent learning frameworks that primarily rely on human guidance other than conventional, step-by-step action demonstrations. We review the motivation, assumption, and implementation of each framework. We then discuss possible future research directions.
