Table of Contents
Fetching ...

Self-Supervised Prediction of the Intention to Interact with a Service Robot

Gabriele Abbate, Alessandro Giusti, Viktor Schmuck, Oya Celiktutan, Antonio Paolillo

TL;DR

This work proposes a learning-based approach to predict the probability that a human user will interact with a robot before the interaction actually begins, and shows that it can learn without external supervision, and can achieve accurate classification.

Abstract

A service robot can provide a smoother interaction experience if it has the ability to proactively detect whether a nearby user intends to interact, in order to adapt its behavior e.g. by explicitly showing that it is available to provide a service. In this work, we propose a learning-based approach to predict the probability that a human user will interact with a robot before the interaction actually begins; the approach is self-supervised because after each encounter with a human, the robot can automatically label it depending on whether it resulted in an interaction or not. We explore different classification approaches, using different sets of features considering the pose and the motion of the user. We validate and deploy the approach in three scenarios. The first collects $3442$ natural sequences (both interacting and non-interacting) representing employees in an office break area: a real-world, challenging setting, where we consider a coffee machine in place of a service robot. The other two scenarios represent researchers interacting with service robots ($200$ and $72$ sequences, respectively). Results show that, even in challenging real-world settings, our approach can learn without external supervision, and can achieve accurate classification (i.e. AUROC greater than $0.9$) of the user's intention to interact with an advance of more than $3$s before the interaction actually occurs.

Self-Supervised Prediction of the Intention to Interact with a Service Robot

TL;DR

This work proposes a learning-based approach to predict the probability that a human user will interact with a robot before the interaction actually begins, and shows that it can learn without external supervision, and can achieve accurate classification.

Abstract

A service robot can provide a smoother interaction experience if it has the ability to proactively detect whether a nearby user intends to interact, in order to adapt its behavior e.g. by explicitly showing that it is available to provide a service. In this work, we propose a learning-based approach to predict the probability that a human user will interact with a robot before the interaction actually begins; the approach is self-supervised because after each encounter with a human, the robot can automatically label it depending on whether it resulted in an interaction or not. We explore different classification approaches, using different sets of features considering the pose and the motion of the user. We validate and deploy the approach in three scenarios. The first collects natural sequences (both interacting and non-interacting) representing employees in an office break area: a real-world, challenging setting, where we consider a coffee machine in place of a service robot. The other two scenarios represent researchers interacting with service robots ( and sequences, respectively). Results show that, even in challenging real-world settings, our approach can learn without external supervision, and can achieve accurate classification (i.e. AUROC greater than ) of the user's intention to interact with an advance of more than s before the interaction actually occurs.
Paper Structure (20 sections, 6 equations, 8 figures, 2 tables)

This paper contains 20 sections, 6 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: For a robot providing information in a corridor of a public building (top) or serving a chocolate treat to a passerby (bottom), it is crucial to proactively detect the human intention to interact even when the user is still at a distance, in order to adopt behaviors perceived as friendly, demonstrate availability to interact, and more efficiently offer the relevant services.
  • Figure 2: To collect data, the motion of people walking in a break area is monitored to predict their intention to interact with a coffee machine.
  • Figure 3: Coffee break scenario: performance of the classifiers according to the AUROC metric for the different models (from top to bottom: LC, RF, MLP, and LSTM); tested in different ranges of social distance (from left to right, ranging from below $0.75$ m to above $3.5$ m) and on average over all the distance ranges (last column); and using different sets of features (from $\bm{f}_1$ to $\bm{f}_6$ for each column of the histograms from left to right). The horizontal dotted line denotes the performance of a noninformative classifier (AUROC = 0.5).
  • Figure 4: Coffee break scenario: ROC curve for sequence-level performance (left); Precision, Recall, and Advance detection time w.r.t the threshold of the classifier (right).
  • Figure 5: Coffee break scenario: AUROC of the model on each day of the self-supervised learning experiment (see text). Boxplots report statistics over 20 runs of the experiment.
  • ...and 3 more figures