Table of Contents
Fetching ...

Towards Data-Driven Metrics for Social Robot Navigation Benchmarking

Pilar Bachiller-Burgos, Ulysses Bernardet, Luis V. Calderita, Pranup Chhetri, Anthony Francis, Noriaki Hirose, Noé Pérez, Dhruv Shah, Phani T. Singamaneni, Xuesu Xiao, Luis J. Manso

TL;DR

This work addresses the absence of standardized, human-aligned benchmarks for social robot navigation by proposing All-encompassing Learned Trajectory-wise (ALT) metrics learned from full-trajectory human ratings. It provides a dataset specification, open-source tooling, and a proof-of-concept ALT metric trained on 4427 trajectories (real and simulated) with 4402 high-quality ratings, achieving a test MSE of $0.0457$ and MAE of $0.160$, and showing stronger correlation with human judgments than analytic metrics. The approach encodes trajectories as sequences of enriched features plus LLM-derived context embeddings and uses a GRU-based model to predict trajectory scores, enabling policy optimization and benchmarking. Qualitative analyses demonstrate that the metric captures context- and task-dependent trade-offs between speed, proxemics, and safety, pointing to scalable future work to broaden dataset coverage and refine architectures for domain-specific navigation reasoning.

Abstract

This paper presents a joint effort towards the development of a data-driven Social Robot Navigation metric to facilitate benchmarking and policy optimization for ground robots. We compiled a dataset with 4427 trajectories -- 182 real and 4245 simulated -- and presented it to human raters, yielding a total of 4402 rated trajectories after data quality assurance. Notably, we provide the first all-encompassing learned social robot navigation metric, along qualitative and quantitative results, including the test loss achieved, a comparison against hand-crafted metrics, and an ablation study. All data, software, and model weights are publicly available.

Towards Data-Driven Metrics for Social Robot Navigation Benchmarking

TL;DR

This work addresses the absence of standardized, human-aligned benchmarks for social robot navigation by proposing All-encompassing Learned Trajectory-wise (ALT) metrics learned from full-trajectory human ratings. It provides a dataset specification, open-source tooling, and a proof-of-concept ALT metric trained on 4427 trajectories (real and simulated) with 4402 high-quality ratings, achieving a test MSE of and MAE of , and showing stronger correlation with human judgments than analytic metrics. The approach encodes trajectories as sequences of enriched features plus LLM-derived context embeddings and uses a GRU-based model to predict trajectory scores, enabling policy optimization and benchmarking. Qualitative analyses demonstrate that the metric captures context- and task-dependent trade-offs between speed, proxemics, and safety, pointing to scalable future work to broaden dataset coverage and refine architectures for domain-specific navigation reasoning.

Abstract

This paper presents a joint effort towards the development of a data-driven Social Robot Navigation metric to facilitate benchmarking and policy optimization for ground robots. We compiled a dataset with 4427 trajectories -- 182 real and 4245 simulated -- and presented it to human raters, yielding a total of 4402 rated trajectories after data quality assurance. Notably, we provide the first all-encompassing learned social robot navigation metric, along qualitative and quantitative results, including the test loss achieved, a comparison against hand-crafted metrics, and an ablation study. All data, software, and model weights are publicly available.

Paper Structure

This paper contains 15 sections, 1 equation, 4 figures, 4 tables.

Figures (4)

  • Figure 1: We acquire trajectories from both real and simulated scenarios. Simulated trajectories have a lower cost and allow for ethical recording of unsafe behavior, but real trajectories are still required to ensure generalization. We generate a top-down view for each trajectory and show it to raters to collect scores. All the data collected is public, including trajectories, raters' data, and their ratings. Metric and policy learning are use cases for the dataset.
  • Figure 2: Consistency map of selected raters.
  • Figure 3: Plot for the control questions with their corresponding mean, standard deviation, and the estimations made by the model. The control questions are shown sorted according to their mean score.
  • Figure 4: Depiction of the output of the learned metric for different trajectory variations, situations, contexts and speeds. In all scenarios, the robot and the goal are in the same position, as shown in Fig. \ref{['fig:qual_scenario']}, in orange at the bottom and green at the top, respectively. In Figs. \ref{['fig:qual_one_human']} and \ref{['fig:qual_walk_forward']}, there is a single human; in the first case static, in the second approaching the robot's initial location. In Fig. \ref{['fig:qual_three_humans']}, there are three humans, as depicted in Fig. \ref{['fig:qual_scenario']}. To reach its goal, the robot follows different trajectories, with varying degrees of divergence from the central line (top-left image of Fig. \ref{['fig:qual_scenario']}). The context identifiers are shown on the top-left of each figure and described in Sec. \ref{['qualitative']}. Speed is color-coded: 0.20 m/s 0.40 m/s 0.80 m/s 1.60 m/s.