Online Behavior Modification for Expressive User Control of RL-Trained Robots

Isaac Sheidlower; Mavis Murdock; Emma Bethel; Reuben M. Aronson; Elaine Schaertl Short

Online Behavior Modification for Expressive User Control of RL-Trained Robots

Isaac Sheidlower, Mavis Murdock, Emma Bethel, Reuben M. Aronson, Elaine Schaertl Short

TL;DR

This paper presents a behavior-diversity-based algorithm, Adjustable Control Of RL Dynamics (ACORD), and demonstrates its applicability to online behavior modification in simulation and a user study, and compares ACORD to RL and Shared Autonomy.

Abstract

Reinforcement Learning (RL) is an effective method for robots to learn tasks. However, in typical RL, end-users have little to no control over how the robot does the task after the robot has been deployed. To address this, we introduce the idea of online behavior modification, a paradigm in which users have control over behavior features of a robot in real time as it autonomously completes a task using an RL-trained policy. To show the value of this user-centered formulation for human-robot interaction, we present a behavior diversity based algorithm, Adjustable Control Of RL Dynamics (ACORD), and demonstrate its applicability to online behavior modification in simulation and a user study. In the study (n=23) users adjust the style of paintings as a robot traces a shape autonomously. We compare ACORD to RL and Shared Autonomy (SA), and show ACORD affords user-preferred levels of control and expression, comparable to SA, but with the potential for autonomous execution and robustness of RL.

Online Behavior Modification for Expressive User Control of RL-Trained Robots

TL;DR

Abstract

Paper Structure (12 sections, 2 equations, 7 figures, 1 algorithm)

This paper contains 12 sections, 2 equations, 7 figures, 1 algorithm.

Introduction
Related Work
Learning Policies for Online Behavior Modification in RL Settings
ACORD for Continuous Control RL-tasks
ACORD Algorithm
On Using a Heuristic Progress Function
ACORD in Simulation
User Study
Conditions
Experimental Procedure
Results
Discussion

Figures (7)

Figure 1: A participant using ACORD to adjust the style of a painting as the robot traces a heart autonomously.
Figure 2: Left: The walking agent varies its behavior in a predictable and interpretable way given changes of k. The ghost traces from the previous six video frames show the agent’s change in speed. Right: The resulting manifold learned by ACORD in the walker environment. The speed is robust to different hull angles.
Figure 3: Overview of the study procedure. Participants interacted with each of the three conditions (order was counterbalanced), completing a survey after each condition.
Figure 4: Participant paintings. Users were able to produce a wide range of different styles for the pre-specified shapes, including the emergent "polka dot" style in SA (4th column from left) and widening or narrowing "strokes" using ACORD (rightmost column, top and center).
Figure 5: Responses to post-condition 5-point Likert scale questions. The darkest blue represents "strongly agree" or, in the case of Mental Demand, "very high." The darkest red represents "strongly disagree" or, in the case of Mental Demand "very low."
...and 2 more figures

Online Behavior Modification for Expressive User Control of RL-Trained Robots

TL;DR

Abstract

Online Behavior Modification for Expressive User Control of RL-Trained Robots

Authors

TL;DR

Abstract

Table of Contents

Figures (7)