Table of Contents
Fetching ...

GUIDE: Real-Time Human-Shaped Agents

Lingyu Zhang, Zhengran Ji, Nicholas R Waytowich, Boyuan Chen

TL;DR

GUIDE, a framework for real-time human-guided reinforcement learning is introduced by enabling continuous human feedback and grounding such feedback into dense rewards to accelerate policy learning by reducing the need for human input while allowing continual training.

Abstract

The recent rapid advancement of machine learning has been driven by increasingly powerful models with the growing availability of training data and computational resources. However, real-time decision-making tasks with limited time and sparse learning signals remain challenging. One way of improving the learning speed and performance of these agents is to leverage human guidance. In this work, we introduce GUIDE, a framework for real-time human-guided reinforcement learning by enabling continuous human feedback and grounding such feedback into dense rewards to accelerate policy learning. Additionally, our method features a simulated feedback module that learns and replicates human feedback patterns in an online fashion, effectively reducing the need for human input while allowing continual training. We demonstrate the performance of our framework on challenging tasks with sparse rewards and visual observations. Our human study involving 50 subjects offers strong quantitative and qualitative evidence of the effectiveness of our approach. With only 10 minutes of human feedback, our algorithm achieves up to 30% increase in success rate compared to its RL baseline.

GUIDE: Real-Time Human-Shaped Agents

TL;DR

GUIDE, a framework for real-time human-guided reinforcement learning is introduced by enabling continuous human feedback and grounding such feedback into dense rewards to accelerate policy learning by reducing the need for human input while allowing continual training.

Abstract

The recent rapid advancement of machine learning has been driven by increasingly powerful models with the growing availability of training data and computational resources. However, real-time decision-making tasks with limited time and sparse learning signals remain challenging. One way of improving the learning speed and performance of these agents is to leverage human guidance. In this work, we introduce GUIDE, a framework for real-time human-guided reinforcement learning by enabling continuous human feedback and grounding such feedback into dense rewards to accelerate policy learning. Additionally, our method features a simulated feedback module that learns and replicates human feedback patterns in an online fashion, effectively reducing the need for human input while allowing continual training. We demonstrate the performance of our framework on challenging tasks with sparse rewards and visual observations. Our human study involving 50 subjects offers strong quantitative and qualitative evidence of the effectiveness of our approach. With only 10 minutes of human feedback, our algorithm achieves up to 30% increase in success rate compared to its RL baseline.

Paper Structure

This paper contains 20 sections, 3 equations, 12 figures.

Figures (12)

  • Figure 1: GUIDE: The training consists of two stages: During the Human guidance stage, the human trainer observes the state and action taken by the agent and provides real-time continuous feedback. The feedback values are grounded into per-step dense rewards and combined with the environment reward. Concurrently, we train a human feedback simulator that takes in state-action pairs and regresses the feedback values. During the Automated guidance stage, the trained simulator stands in for the human and provides feedback to continue to improve the policy, effectively reducing human efforts and cognitive loads.
  • Figure 2: (A) Conventional discrete feedback. (B) Our continuous feedback. The histograms indicate the feedback distribution provided by the same subject on the same task. Continuous feedback carries more information from the human trainer.
  • Figure 3: Cognitive Tests: We conducted a series of cognitive tests to quantify how individual differences among subjects affect their guided agents' performances. (A) Eye Alignment (B) Reflex (C) Theory of Behavior (D) Mental Rotation (E) Mental Fitting (F) Spatial Mapping
  • Figure 4: GUIDE performance compared with other baselines. In challenging tasks, GUIDE consistently outperforms all other baselines. Subjects with higher cognitive test scores also result in higher performance in the learned agents as shown in the top row (Top 15).
  • Figure 5: Exploration behavior of GUIDE and DDPG agents. For each of the plots, the x-axis is the step number through the course of an episode. The y-axis is the ratio between the area of the visible view and the entire input frame. We observe a stronger tendency of exploration exhibited by the human-guided agent compared to the baseline RL agent.
  • ...and 7 more figures