Table of Contents
Fetching ...

Autonomous Human-Robot Interaction via Operator Imitation

Sammy Christen, David Müller, Agon Serifi, Ruben Grandia, Georg Wiedebach, Michael A. Hopkins, Espen Knoop, Moritz Bächer

TL;DR

This work addresses autonomous human-robot interaction by learning to imitate operator commands from a small, mood-diverse dataset. It introduces a unified transformer that combines diffusion-based continuous command prediction with a classifier for discrete events, conditioned on robot-relative human pose. The approach achieves mood-expressive, autonomous interactions that rival expert operator performance in simulation and real user studies, and demonstrates zero-shot transfer to a different robot platform. By focusing on operator-driven data rather than low-level actuation, the method offers data efficiency, safety guarantees, and cross-embodiment applicability for expressive HRI.

Abstract

Teleoperated robotic characters can perform expressive interactions with humans, relying on the operators' experience and social intuition. In this work, we propose to create autonomous interactive robots, by training a model to imitate operator data. Our model is trained on a dataset of human-robot interactions, where an expert operator is asked to vary the interactions and mood of the robot, while the operator commands as well as the pose of the human and robot are recorded. Our approach learns to predict continuous operator commands through a diffusion process and discrete commands through a classifier, all unified within a single transformer architecture. We evaluate the resulting model in simulation and with a user study on the real system. We show that our method enables simple autonomous human-robot interactions that are comparable to the expert-operator baseline, and that users can recognize the different robot moods as generated by our model. Finally, we demonstrate a zero-shot transfer of our model onto a different robotic platform with the same operator interface.

Autonomous Human-Robot Interaction via Operator Imitation

TL;DR

This work addresses autonomous human-robot interaction by learning to imitate operator commands from a small, mood-diverse dataset. It introduces a unified transformer that combines diffusion-based continuous command prediction with a classifier for discrete events, conditioned on robot-relative human pose. The approach achieves mood-expressive, autonomous interactions that rival expert operator performance in simulation and real user studies, and demonstrates zero-shot transfer to a different robot platform. By focusing on operator-driven data rather than low-level actuation, the method offers data efficiency, safety guarantees, and cross-embodiment applicability for expressive HRI.

Abstract

Teleoperated robotic characters can perform expressive interactions with humans, relying on the operators' experience and social intuition. In this work, we propose to create autonomous interactive robots, by training a model to imitate operator data. Our model is trained on a dataset of human-robot interactions, where an expert operator is asked to vary the interactions and mood of the robot, while the operator commands as well as the pose of the human and robot are recorded. Our approach learns to predict continuous operator commands through a diffusion process and discrete commands through a classifier, all unified within a single transformer architecture. We evaluate the resulting model in simulation and with a user study on the real system. We show that our method enables simple autonomous human-robot interactions that are comparable to the expert-operator baseline, and that users can recognize the different robot moods as generated by our model. Finally, we demonstrate a zero-shot transfer of our model onto a different robotic platform with the same operator interface.

Paper Structure

This paper contains 25 sections, 4 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Overview of our approach. We first collect a human-robot interaction dataset with a remote operator (red, top left). A motion control module takes operator commands and maps them to robot actions (cyan, middle). We then train a diffusion-based model that learns to imitate the operator (blue, bottom left). Our system learns to perform simple interactions with a human and express different moods. Dashed lines indicate components only required for data collection and training.
  • Figure 2: Our capture setup. An operator controls the robot to interact with a human participant. The poses of the human and robot, as well as the operator commands, are recorded in a motion capture studio.
  • Figure 3: Overview of our method architecture. We use conditions in the form of past human poses and commands (yellow). A diffusion model predicts operator commands to control the robot model (blue). The transformer also outputs discrete predictions for different behavior and the mode (red).
  • Figure 4: Example behavior of different moods. In the angry mood, the robot refuses interactions and steps away from the human (top). In the sad mood, the robot turns away from the human, has its head tilted towards the ground and occasionally shakes its head (bottom).
  • Figure 5: Diversity of our framework. Given the same starting point (black trapezoid) and a fixed human position (black star), we run our model multiple times and plot x-y positions. As can be seen, the model generates different behavior for the same human pose conditions.
  • ...and 1 more figures