Table of Contents
Fetching ...

SocRATES: Towards Automated Scenario-based Testing of Social Navigation Algorithms

Shashank Rao Marpally, Pranav Goyal, Harold Soh

TL;DR

The paper tackles the challenge of evaluating social navigation by introducing SocRATES, a pipeline that automatically generates context- and location-specific social navigation scenarios from simple textual and image inputs using large language models (LLMs) and vision-language models (VLMs). These scenarios are instantiated in a HuNavSim/ROS2 Gazebo simulation to assess both robot task performance and social appropriateness, enabling controlled benchmarking beyond proxemics-based metrics. Key contributions include a five-component methodology (map annotation, scenario description, path generation for pedestrians and robot, pedestrian behavior generation, and simulation), interactive path correction, and demonstrations through design analyses, usability studies with researchers, and a persona-based case study. The results show fast scenario generation at low cost, a substantial first-pass success rate with iterative refinement, and meaningful time savings and insights for comparing navigation algorithms across varied social contexts. Overall, SocRATES provides a scalable, flexible platform for comprehensive social navigation assessment that complements existing proxemics-focused benchmarks.

Abstract

Current social navigation methods and benchmarks primarily focus on proxemics and task efficiency. While these factors are important, qualitative aspects such as perceptions of a robot's social competence are equally crucial for successful adoption and integration into human environments. We propose a more comprehensive evaluation of social navigation through scenario-based testing, where specific human-robot interaction scenarios can reveal key robot behaviors. However, creating such scenarios is often labor-intensive and complex. In this work, we address this challenge by introducing a pipeline that automates the generation of context-, and location-appropriate social navigation scenarios, ready for simulation. Our pipeline transforms simple scenario metadata into detailed textual scenarios, infers pedestrian and robot trajectories, and simulates pedestrian behaviors, which enables more controlled evaluation. We leverage the social reasoning and code-generation capabilities of Large Language Models (LLMs) to streamline scenario generation and translation. Our experiments show that our pipeline produces realistic scenarios and significantly improves scenario translation over naive LLM prompting. Additionally, we present initial feedback from a usability study with social navigation experts and a case-study demonstrating a scenario-based evaluation of three navigation algorithms.

SocRATES: Towards Automated Scenario-based Testing of Social Navigation Algorithms

TL;DR

The paper tackles the challenge of evaluating social navigation by introducing SocRATES, a pipeline that automatically generates context- and location-specific social navigation scenarios from simple textual and image inputs using large language models (LLMs) and vision-language models (VLMs). These scenarios are instantiated in a HuNavSim/ROS2 Gazebo simulation to assess both robot task performance and social appropriateness, enabling controlled benchmarking beyond proxemics-based metrics. Key contributions include a five-component methodology (map annotation, scenario description, path generation for pedestrians and robot, pedestrian behavior generation, and simulation), interactive path correction, and demonstrations through design analyses, usability studies with researchers, and a persona-based case study. The results show fast scenario generation at low cost, a substantial first-pass success rate with iterative refinement, and meaningful time savings and insights for comparing navigation algorithms across varied social contexts. Overall, SocRATES provides a scalable, flexible platform for comprehensive social navigation assessment that complements existing proxemics-focused benchmarks.

Abstract

Current social navigation methods and benchmarks primarily focus on proxemics and task efficiency. While these factors are important, qualitative aspects such as perceptions of a robot's social competence are equally crucial for successful adoption and integration into human environments. We propose a more comprehensive evaluation of social navigation through scenario-based testing, where specific human-robot interaction scenarios can reveal key robot behaviors. However, creating such scenarios is often labor-intensive and complex. In this work, we address this challenge by introducing a pipeline that automates the generation of context-, and location-appropriate social navigation scenarios, ready for simulation. Our pipeline transforms simple scenario metadata into detailed textual scenarios, infers pedestrian and robot trajectories, and simulates pedestrian behaviors, which enables more controlled evaluation. We leverage the social reasoning and code-generation capabilities of Large Language Models (LLMs) to streamline scenario generation and translation. Our experiments show that our pipeline produces realistic scenarios and significantly improves scenario translation over naive LLM prompting. Additionally, we present initial feedback from a usability study with social navigation experts and a case-study demonstrating a scenario-based evaluation of three navigation algorithms.
Paper Structure (15 sections, 4 figures)

This paper contains 15 sections, 4 figures.

Figures (4)

  • Figure 1: We propose SocRATES, an automated system that leverages VLMs to generate simulated social navigation scenarios from simple textual and image inputs.
  • Figure 2: Overview of our pipeline. We prompt users to annotate the map of their desired location (1) and provide simple textual inputs for their desired scenario. Our pipeline proposes a scenario (2) and then generates the 2 main components of the scenario with structured prompts to an LLM: The paths of the robot and pedestrians (3) and the behavior of the humans (4). Finally these are used by the HuNavSimperez2023hunavsim framework (5) to generate a simulation of the scenario.
  • Figure 3: Participant ratings for the navigation algorithms for the four scenarios across various social dimensions.
  • Figure 4: Two of the scenarios generated in the Persona-based Assessment