Table of Contents
Fetching ...

Improving through Interaction: Searching Behavioral Representation Spaces with CMA-ES-IG

Nathaniel Dennler, Zhonghao Shi, Yiran Tao, Andreea Bobu, Stefanos Nikolaidis, Maja Matarić

TL;DR

CMA-ES-IG explicitly incorporates user experience considerations into the preference learning process by suggesting perceptually distinct and informative trajectories for users to rank, and is preferred by non-expert users in identifying their preferred robot behaviors.

Abstract

Robots that interact with humans must adapt to individual users' preferences to operate effectively in human-centered environments. An intuitive and effective technique to learn non-expert users' preferences is through rankings of robot behaviors, e.g., trajectories, gestures, or voices. Existing techniques primarily focus on generating queries that optimize preference learning outcomes, such as sample efficiency or final preference estimation accuracy. However, the focus on outcome overlooks key user expectations in the process of providing these rankings, which can negatively impact users' adoption of robotic systems. This work proposes the Covariance Matrix Adaptation Evolution Strategies with Information Gain (CMA-ES-IG) algorithm. CMA-ES-IG explicitly incorporates user experience considerations into the preference learning process by suggesting perceptually distinct and informative trajectories for users to rank. We demonstrate these benefits through both simulated studies and real-robot experiments. CMA-ES-IG, compared to state-of-the-art alternatives, (1) scales more effectively to higher-dimensional preference spaces, (2) maintains computational tractability for high-dimensional problems, (3) is robust to noisy or inconsistent user feedback, and (4) is preferred by non-expert users in identifying their preferred robot behaviors. This project's code is available at github.com/interaction-lab/CMA-ES-IG

Improving through Interaction: Searching Behavioral Representation Spaces with CMA-ES-IG

TL;DR

CMA-ES-IG explicitly incorporates user experience considerations into the preference learning process by suggesting perceptually distinct and informative trajectories for users to rank, and is preferred by non-expert users in identifying their preferred robot behaviors.

Abstract

Robots that interact with humans must adapt to individual users' preferences to operate effectively in human-centered environments. An intuitive and effective technique to learn non-expert users' preferences is through rankings of robot behaviors, e.g., trajectories, gestures, or voices. Existing techniques primarily focus on generating queries that optimize preference learning outcomes, such as sample efficiency or final preference estimation accuracy. However, the focus on outcome overlooks key user expectations in the process of providing these rankings, which can negatively impact users' adoption of robotic systems. This work proposes the Covariance Matrix Adaptation Evolution Strategies with Information Gain (CMA-ES-IG) algorithm. CMA-ES-IG explicitly incorporates user experience considerations into the preference learning process by suggesting perceptually distinct and informative trajectories for users to rank. We demonstrate these benefits through both simulated studies and real-robot experiments. CMA-ES-IG, compared to state-of-the-art alternatives, (1) scales more effectively to higher-dimensional preference spaces, (2) maintains computational tractability for high-dimensional problems, (3) is robust to noisy or inconsistent user feedback, and (4) is preferred by non-expert users in identifying their preferred robot behaviors. This project's code is available at github.com/interaction-lab/CMA-ES-IG
Paper Structure (26 sections, 15 equations, 8 figures, 12 tables, 1 algorithm)

This paper contains 26 sections, 15 equations, 8 figures, 12 tables, 1 algorithm.

Figures (8)

  • Figure 1: Query Generation Techniques. The spheres on the top row represent a 3-dimensional trajectory feature space, with a blue arrow indicating the user's true preference and the red arrow indicating the current estimated preference. The red circle denotes the equator of the sphere defined by the red preference estimate. The red cloud of points denote the sampling space for CMA-ES. Optimizing for information gain results in trajectories that spaced evenly across the red circle which are easy to distinguish for the user, but do not achieve high rewards (a). Using Covariance matrix adaptation evolution strategies (CMA-ES) results in trajectories with higher rewards, but they are not easy for the user to easily distinguish (b). CMA-ES with information gain (CMA-ES-IG) generates trajectories that are both easy to distinguish for the user and achieve high reward.
  • Figure 2: Quality of suggested trajectories over time. Across all dimensions, CMA-ES-IG (orange) generates higher-quality trajectories (i.e., trajectories that receive higher reward on average) for users to rank compared to CMA-ES (blue) and Infogain (gray).
  • Figure 3: Simulated Domains. We evaluated our algorithms across four simulated domains that represent physical (a,b) and social (c,d) preference-based robot tasks. In Lunar Lander domain (a), preference determines the path of the spaceship takes to land within the flags. In the Driving domain (b), preference determines how the autonomous vehicle merges. In the Face Design domain (c), preference determines the appearance of a screen-based robot face. In the Voice Design domain (d), preference determines the sound of a text-to-speech voice.
  • Figure 4: Quality of suggested trajectories over time for simulated robotic environments. For each simulated robot domain, CMA-ES-IG significantly outperforms the baseline algorithms by producing higher-quality trajectory queries in earlier iterations. This result demonstrates that CMA-ES-IG consistently suggests high-quality trajectories under a variety of representation spaces.
  • Figure 5: The two domains that users taught robots their preferences for the robot's behaviors. In the physical domain (a), users ranked a JACO arm's movement trajectories to hand them a marker, a cup, and a spoon. In the social domain (b), users ranked a Blossom robot's gestures to portray happiness, sadness, and anger.
  • ...and 3 more figures