PECAN: Personalizing Robot Behaviors through a Learned Canonical Space

Heramb Nemlekar; Robert Ramirez Sanchez; Dylan P. Losey

PECAN: Personalizing Robot Behaviors through a Learned Canonical Space

Heramb Nemlekar, Robert Ramirez Sanchez, Dylan P. Losey

TL;DR

PECAN tackles the problem of directly personalizing robot behavior across multiple tasks by learning a canonical style space that encodes user preferences as a continuous latent variable. It presents a two-encoder–decoder architecture with a discrete task space $Z_{ au}$ and a continuous style space $Z_{ heta}$, trained with a reconstruction loss and a semi-supervised cross-entropy loss that anchors style extremes at opposite corners of the canonical space. The key contributions are the design of a user-friendly, monotonic, and consistent style space, the use of weak supervision to disentangle tasks and styles, and empirical validation through simulations and two human-robot interaction studies showing improved intuitiveness and faster personalization compared to baselines. Practically, PECAN enables direct, rapid, and cross-task personalization with minimal user input, offering a scalable approach for adapting robot behaviors to individual users in shared tasks and real-world deployments.

Abstract

Robots should personalize how they perform tasks to match the needs of individual human users. Today's robot achieve this personalization by asking for the human's feedback in the task space. For example, an autonomous car might show the human two different ways to decelerate at stoplights, and ask the human which of these motions they prefer. This current approach to personalization is indirect: based on the behaviors the human selects (e.g., decelerating slowly), the robot tries to infer their underlying preference (e.g., defensive driving). By contrast, our paper develops a learning and interface-based approach that enables humans to directly indicate their desired style. We do this by learning an abstract, low-dimensional, and continuous canonical space from human demonstration data. Each point in the canonical space corresponds to a different style (e.g., defensive or aggressive driving), and users can directly personalize the robot's behavior by simply clicking on a point. Given the human's selection, the robot then decodes this canonical style across each task in the dataset -- e.g., if the human selects a defensive style, the autonomous car personalizes its behavior to drive defensively when decelerating, passing other cars, or merging onto highways. We refer to our resulting approach as PECAN: Personalizing Robot Behaviors through a Learned Canonical Space. Our simulations and user studies suggest that humans prefer using PECAN to directly personalize robot behavior (particularly when those users become familiar with PECAN), and that users find the learned canonical space to be intuitive and consistent. See videos here: https://youtu.be/wRJpyr23PKI

PECAN: Personalizing Robot Behaviors through a Learned Canonical Space

TL;DR

and a continuous style space

, trained with a reconstruction loss and a semi-supervised cross-entropy loss that anchors style extremes at opposite corners of the canonical space. The key contributions are the design of a user-friendly, monotonic, and consistent style space, the use of weak supervision to disentangle tasks and styles, and empirical validation through simulations and two human-robot interaction studies showing improved intuitiveness and faster personalization compared to baselines. Practically, PECAN enables direct, rapid, and cross-task personalization with minimal user input, offering a scalable approach for adapting robot behaviors to individual users in shared tasks and real-world deployments.

Abstract

Paper Structure (19 sections, 12 equations, 11 figures, 2 tables)

This paper contains 19 sections, 12 equations, 11 figures, 2 tables.

Introduction
Related Work
Problem Statement
Learning a Canonical Style Space
Separately Encoding Tasks and Styles
Characteristics of a User-Friendly Canonical Space
Semi-supervised Learning
Simulation Experiments
User study
Learning User-Friendly Canonical Spaces
Direct vs. Indirect Personalization
Follow-up Study: Direct vs. Indirect Personalization
Practical Considerations
Conclusion
Appendix
...and 4 more sections

Figures (11)

Figure 1: User personalizing the driving style of an autonomous car. With existing methods, the user provides feedback about their preferred behavior, and the robot indirectly estimates their style based on this feedback. We propose a direct approach where users select their style in a canonical space, and the robot applies this user-chosen style across each task it encounters.
Figure 2: Proposed architecture for Personalizing Robot Behaviors through a Learned Canonical Space (PECAN). (Left) The robot uses a task encoder $\psi_{\tau}$ and a style encoder $\psi_{\theta}$ to embed input demonstrations $\xi \in \mathcal{D}$ into two low-dimensional spaces: a latent task space $Z_{\tau}$ and a latent style space $Z_{\theta}$. A decoder network $\phi$ takes the combined latent tasks and styles as input and reproduces the input demonstrations. For labeled demonstrations, a classifier network $\Delta$ predicts the class labels from their latent styles. We train both the encoders and the decoder to accurately reconstruct the demonstrations. Simultaneously, we also train the style encoder along with the classifier such that it assigns similar latent values to trajectories with the same label. (Right) We show that when labeled demonstrations represent the extreme ends of the style spectrum, the canonical space is organized so that the latent styles of these extremes are positioned at the corners. This arrangement allows users to interpolate between the extremes by choosing intermediate latent values.
Figure 3: Simulation results for autonomous driving (Top row) and robot manipulation (Bottom row) environments. We compare our proposed approach, PECAN, to a state-of-the-art baseline, SeGMA, and ablations of our approach, Ours-L and Ours-X. While SeGMA uses task labels, PECAN uses labels for trajectories with similar styles (specifically the style extremes). Both ablations use the same architecture as PECAN, however, Ours-L does not train with any labels, whereas Ours-X uses labels for trajectories with intermediate styles (instead of the style extremes). In both environments, PECAN achieves comparable Task Accuracy to SeGMA. Although PECAN has significantly lower Trajectory Error in the robot environment, its performance is similar to the baselines in the driving environment. The main advantage of PECAN over the baseline methods is that the canonical spaces learned by our approach are more consistent and monotonic (i.e., more user friendly). An asterisk (*) denotes statistical significance and the error bars indicate standard error.
Figure 4: Interfaces for personalizing the behavior of the robot (Left) and the autonomous car (Right) in our user studies. In the robot study, the canonical space was a 1D line which represented the distance that the robot maintains from the user. Users personalized the style of the robot's trajectory by moving the slider along the line. For tasks 1 and 2, users could visualize the robot's trajectory in a Pybullet simulation before executing it on the robot in the real world. In the car study, the canonical space was a 2D square which captured the maximum speed of the autonomous car and the minimum distance it maintains from other cars on the road. Users personalized the car's driving style by selecting different points inside the square. Since the driving environment was entirely in simulation, there was no need to visualize the car's trajectory separately before execution. Figure \ref{['fig:simulation']} shows examples of the simulated car in the Highway and Intersection tasks.
Figure 5: Objective and subjective results from the robot user study. (Left) User applies the same latent value $z_{\theta}$ from Place Cup to Pour Coffee expecting a similar style across both tasks. PECAN produces trajectories with similar distances $\theta$ for Place Cup and Pour Coffee, but SeGMA generates a straight line trajectory for Pour Coffee. (Top Right) When using PECAN participants had lower task error, style error, and fewer real attempts (t(13) = -3.12, p < 0.01) performing new tasks without visual information. (Bottom Right) Subjectively participants prefer (t(13) = 2.55, p < 0.05) working with PECAN compared to SeGMA, as they found PECAN more intuitive (t(13) = 3.35, p < 0.01), and easy (t(13) = 2.68, p < 0.05) especially without visual information (t(13) = 2.76, p < 0.05). An asterisk (*) denotes statistical significance and the error bars indicate standard error.
...and 6 more figures

PECAN: Personalizing Robot Behaviors through a Learned Canonical Space

TL;DR

Abstract

PECAN: Personalizing Robot Behaviors through a Learned Canonical Space

Authors

TL;DR

Abstract

Table of Contents

Figures (11)