SocRATES: Towards Automated Scenario-based Testing of Social Navigation Algorithms
Shashank Rao Marpally, Pranav Goyal, Harold Soh
TL;DR
The paper tackles the challenge of evaluating social navigation by introducing SocRATES, a pipeline that automatically generates context- and location-specific social navigation scenarios from simple textual and image inputs using large language models (LLMs) and vision-language models (VLMs). These scenarios are instantiated in a HuNavSim/ROS2 Gazebo simulation to assess both robot task performance and social appropriateness, enabling controlled benchmarking beyond proxemics-based metrics. Key contributions include a five-component methodology (map annotation, scenario description, path generation for pedestrians and robot, pedestrian behavior generation, and simulation), interactive path correction, and demonstrations through design analyses, usability studies with researchers, and a persona-based case study. The results show fast scenario generation at low cost, a substantial first-pass success rate with iterative refinement, and meaningful time savings and insights for comparing navigation algorithms across varied social contexts. Overall, SocRATES provides a scalable, flexible platform for comprehensive social navigation assessment that complements existing proxemics-focused benchmarks.
Abstract
Current social navigation methods and benchmarks primarily focus on proxemics and task efficiency. While these factors are important, qualitative aspects such as perceptions of a robot's social competence are equally crucial for successful adoption and integration into human environments. We propose a more comprehensive evaluation of social navigation through scenario-based testing, where specific human-robot interaction scenarios can reveal key robot behaviors. However, creating such scenarios is often labor-intensive and complex. In this work, we address this challenge by introducing a pipeline that automates the generation of context-, and location-appropriate social navigation scenarios, ready for simulation. Our pipeline transforms simple scenario metadata into detailed textual scenarios, infers pedestrian and robot trajectories, and simulates pedestrian behaviors, which enables more controlled evaluation. We leverage the social reasoning and code-generation capabilities of Large Language Models (LLMs) to streamline scenario generation and translation. Our experiments show that our pipeline produces realistic scenarios and significantly improves scenario translation over naive LLM prompting. Additionally, we present initial feedback from a usability study with social navigation experts and a case-study demonstrating a scenario-based evaluation of three navigation algorithms.
