Towards Data-Driven Metrics for Social Robot Navigation Benchmarking
Pilar Bachiller-Burgos, Ulysses Bernardet, Luis V. Calderita, Pranup Chhetri, Anthony Francis, Noriaki Hirose, Noé Pérez, Dhruv Shah, Phani T. Singamaneni, Xuesu Xiao, Luis J. Manso
TL;DR
This work addresses the absence of standardized, human-aligned benchmarks for social robot navigation by proposing All-encompassing Learned Trajectory-wise (ALT) metrics learned from full-trajectory human ratings. It provides a dataset specification, open-source tooling, and a proof-of-concept ALT metric trained on 4427 trajectories (real and simulated) with 4402 high-quality ratings, achieving a test MSE of $0.0457$ and MAE of $0.160$, and showing stronger correlation with human judgments than analytic metrics. The approach encodes trajectories as sequences of enriched features plus LLM-derived context embeddings and uses a GRU-based model to predict trajectory scores, enabling policy optimization and benchmarking. Qualitative analyses demonstrate that the metric captures context- and task-dependent trade-offs between speed, proxemics, and safety, pointing to scalable future work to broaden dataset coverage and refine architectures for domain-specific navigation reasoning.
Abstract
This paper presents a joint effort towards the development of a data-driven Social Robot Navigation metric to facilitate benchmarking and policy optimization for ground robots. We compiled a dataset with 4427 trajectories -- 182 real and 4245 simulated -- and presented it to human raters, yielding a total of 4402 rated trajectories after data quality assurance. Notably, we provide the first all-encompassing learned social robot navigation metric, along qualitative and quantitative results, including the test loss achieved, a comparison against hand-crafted metrics, and an ablation study. All data, software, and model weights are publicly available.
