Table of Contents
Fetching ...

Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots

Nathaniel Dennler, Zhonghao Shi, Stefanos Nikolaidis, Maja Matarić

TL;DR

This work designs an algorithm to generate trajectories for users to rank that is more intuitive and easier to use than previous approaches across both physical and social robot tasks and prioritizes the user's experience of the preference learning process.

Abstract

Assistive robots interact with humans and must adapt to different users' preferences to be effective. An easy and effective technique to learn non-expert users' preferences is through rankings of robot behaviors, for example, robot movement trajectories or gestures. Existing techniques focus on generating trajectories for users to rank that maximize the outcome of the preference learning process. However, the generated trajectories do not appear to reflect the user's preference over repeated interactions. In this work, we design an algorithm to generate trajectories for users to rank that we call Covariance Matrix Adaptation Evolution Strategies with Information Gain (CMA-ES-IG). CMA-ES-IG prioritizes the user's experience of the preference learning process. We show that users find our algorithm more intuitive and easier to use than previous approaches across both physical and social robot tasks. This project's code is hosted at github.com/interaction-lab/CMA-ES-IG

Improving User Experience in Preference-Based Optimization of Reward Functions for Assistive Robots

TL;DR

This work designs an algorithm to generate trajectories for users to rank that is more intuitive and easier to use than previous approaches across both physical and social robot tasks and prioritizes the user's experience of the preference learning process.

Abstract

Assistive robots interact with humans and must adapt to different users' preferences to be effective. An easy and effective technique to learn non-expert users' preferences is through rankings of robot behaviors, for example, robot movement trajectories or gestures. Existing techniques focus on generating trajectories for users to rank that maximize the outcome of the preference learning process. However, the generated trajectories do not appear to reflect the user's preference over repeated interactions. In this work, we design an algorithm to generate trajectories for users to rank that we call Covariance Matrix Adaptation Evolution Strategies with Information Gain (CMA-ES-IG). CMA-ES-IG prioritizes the user's experience of the preference learning process. We show that users find our algorithm more intuitive and easier to use than previous approaches across both physical and social robot tasks. This project's code is hosted at github.com/interaction-lab/CMA-ES-IG

Paper Structure

This paper contains 20 sections, 3 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: The two domains that users taught robots their preferences for the robot's behaviors. In the physical domain, users ranked a JACO arm's movement trajectories to hand them a marker, a cup, and a spoon. In the social domain, users ranked a Blossom robot's gestures to portray happiness, sadness, and anger.
  • Figure 2: Example queries generated from an early step of each algorithm. The large circle represents the space of all trajectories with lighter areas representing higher reward, light blue arrows representing the user's true preference, dark blue arrows representing the current estimate of the user's preference, orange circles representing sampled trajectories to present to the user, and green dotted regions representing the sampling distribution from the current step of the CMA-ES optimizer. Information gain results in easy to differentiate queries, CMA-ES results in higher rewards on average, and CMA-ES-IG results in higher rewards that are easy to differentiate.
  • Figure 3: Comparison of simulation results for learning user preferences. Shaded regions indicate standard error. We found that all methods were able to learn user preferences across varying dimensions. The quality of the trajectories in the query increases only for CMA-ES and CMA-ES-IG, with CMA-ES-IG performing significantly better.
  • Figure 4: The framework for learning user preferences. We learned nonlinear features for sets of robot trajectories. The query sampler produced sets of trajectories for the user to rank and those rankings were used to update the estimate of the user's preferences.
  • Figure 5: User study setup. Users interacted with the robots through the ranking interface to specify their preferences for how the Blossom robot used gestures to signal different affective states and how the JACO robot arm handed them different items.
  • ...and 4 more figures