Few-Shot Inference of Human Perceptions of Robot Performance in Social Navigation Scenarios

Qiping Zhang; Nathan Tsoi; Mofeed Nagib; Hao-Tien Lewis Chiang; Marynel Vázquez

Few-Shot Inference of Human Perceptions of Robot Performance in Social Navigation Scenarios

Qiping Zhang, Nathan Tsoi, Mofeed Nagib, Hao-Tien Lewis Chiang, Marynel Vázquez

TL;DR

The paper investigates using few-shot in-context learning with large language models to predict human perceptions of robot performance in social navigation. By augmenting the SEAN TOGETHER dataset, it demonstrates that LLMs can match or exceed traditional supervised models with an order of magnitude fewer labeled examples, and that prediction improves with more in-context demonstrations. It also analyzes which sensor-based observations drive predictions and shows personalized demonstrations further enhance accuracy, highlighting a scalable, user-centered pathway for evaluating and improving robot behavior in real-world settings. The work points to future extensions with multimodal data and adaptive robot policies that respond to predicted user perceptions.

Abstract

Understanding how humans evaluate robot behavior during human-robot interactions is crucial for developing socially aware robots that behave according to human expectations. While the traditional approach to capturing these evaluations is to conduct a user study, recent work has proposed utilizing machine learning instead. However, existing data-driven methods require large amounts of labeled data, which limits their use in practice. To address this gap, we propose leveraging the few-shot learning capabilities of Large Language Models (LLMs) to improve how well a robot can predict a user's perception of its performance, and study this idea experimentally in social navigation tasks. To this end, we extend the SEAN TOGETHER dataset with additional real-world human-robot navigation episodes and participant feedback. Using this augmented dataset, we evaluate the ability of several LLMs to predict human perceptions of robot performance from a small number of in-context examples, based on observed spatio-temporal cues of the robot and surrounding human motion. Our results demonstrate that LLMs can match or exceed the performance of traditional supervised learning models while requiring an order of magnitude fewer labeled instances. We further show that prediction performance can improve with more in-context examples, confirming the scalability of our approach. Additionally, we investigate what kind of sensor-based information an LLM relies on to make these inferences by conducting an ablation study on the input features considered for performance prediction. Finally, we explore the novel application of personalized examples for in-context learning, i.e., drawn from the same user being evaluated, finding that they further enhance prediction accuracy. This work paves the path to improving robot behavior in a scalable manner through user-centered feedback.

Few-Shot Inference of Human Perceptions of Robot Performance in Social Navigation Scenarios

TL;DR

Abstract

Few-Shot Inference of Human Perceptions of Robot Performance in Social Navigation Scenarios

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)