Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-based Social Robot Navigation

Sara Pohland; Alvin Tan; Prabal Dutta; Claire Tomlin

Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-based Social Robot Navigation

Sara Pohland, Alvin Tan, Prabal Dutta, Claire Tomlin

TL;DR

This work addresses the domain-shift problem in RL-based social robot navigation by endowing policies with uncertainty awareness of nearby pedestrians. It introduces a triad of modifications to an existing policy (training process with Noisy ORCA deviations, an uncertainty estimation network to quantify unpredictability, and a $ ho$-dependent reward to encourage cautious responses to unpredictable pedestrians) and demonstrates substantial improvements. In simulation, the approach yields up to an 82% reduction in collisions and notable reductions in time spent in pedestrians’ personal and intimate spaces, without sacrificing overall navigation efficiency, and transfers to real robots. The framework is shown to generalize beyond a single policy and provides practical guidance for deploying socially aware RL agents in real-world human environments, while acknowledging limitations in tight spaces and opportunities for further refinement and broader pedestrian models.

Abstract

Reinforcement learning (RL) methods for social robot navigation show great success navigating robots through large crowds of people, but the performance of these learning-based methods tends to degrade in particularly challenging or unfamiliar situations due to the models' dependency on representative training data. To ensure human safety and comfort, it is critical that these algorithms handle uncommon cases appropriately, but the low frequency and wide diversity of such situations present a significant challenge for these data-driven methods. To overcome this challenge, we propose modifications to the learning process that encourage these RL policies to maintain additional caution in unfamiliar situations. Specifically, we improve the Socially Attentive Reinforcement Learning (SARL) policy by (1) modifying the training process to systematically introduce deviations into a pedestrian model, (2) updating the value network to estimate and utilize pedestrian-unpredictability features, and (3) implementing a reward function to learn an effective response to pedestrian unpredictability. Compared to the original SARL policy, our modified policy maintains similar navigation times and path lengths, while reducing the number of collisions by 82% and reducing the proportion of time spent in the pedestrians' personal space by up to 19 percentage points for the most difficult cases. We also describe how to apply these modifications to other RL policies and demonstrate that some key high-level behaviors of our approach transfer to a physical robot.

Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-based Social Robot Navigation

TL;DR

-dependent reward to encourage cautious responses to unpredictable pedestrians) and demonstrates substantial improvements. In simulation, the approach yields up to an 82% reduction in collisions and notable reductions in time spent in pedestrians’ personal and intimate spaces, without sacrificing overall navigation efficiency, and transfers to real robots. The framework is shown to generalize beyond a single policy and provides practical guidance for deploying socially aware RL agents in real-world human environments, while acknowledging limitations in tight spaces and opportunities for further refinement and broader pedestrian models.

Abstract

Paper Structure (33 sections, 14 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 33 sections, 14 equations, 9 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Our Uncertainty-Aware RL Policy
Training Process
Model Architecture
Reward Function
Experimental Evaluation
Simulation Experimental Setups
Performance Metrics
Simulation Results & Analysis
Ablation Study on Noisy ORCA Pedestrians
Comparing Different Discomfort Distances
Ablation Study on Diverse, Realistic Pedestrians
Robotic Experiment
Extensions to other RL policies
...and 18 more sections

Figures (9)

Figure 1: (a) RL-based robot navigation policies are trained with humans that behave according to some pedestrian model. (b) During deployment, these policies will encounter pedestrians that behave differently. Existing RL policies generally do not consider this and continue to treat all pedestrians the same, presenting concerns for human comfort and safety. (c) RL policies should distinguish between predictable (green) and unpredictable (pink) pedestrians and maintain appropriate caution while still navigating efficiently.
Figure 2: Estimated unpredictability values for (a) Noisy ORCA pedestrians and (b) ORCA, CADRL, and Linear pedestrians. Lines indicate paths of the agent, and circles indicate ending positions. The robot is colored black, and the pedestrians are colored according to their average estimated unpredictability value. Notice that pedestrians who walk haphazardly (pink pedestrians in (a)) and those who walk straight through the middle without engaging in collision avoidance maneuvers (pink pedestrians in (b)) have high associated unpredictability values. Those that behave more normally (blue pedestrians in (a) and (b)) have lower associated values.
Figure 3: Our augmented value network. Given a history of observations for pedestrian $i$, $MLP_1$ estimates the ORCA policy deviation $\hat{\rho}_i$ associated with that pedestrian. This $\hat{\rho}_i$ is combined with the current observation of the pedestrian and passed into $MLP_2$. The rest of the network generates a compact representation, $c$, of the entire set of pedestrians, which is combined with the robot's state, $s$, to obtain an estimate of the value function, $\tilde{V}(s, o_1[0:t],\hdots,o_n[0:t])$.
Figure 4: Success rates of various policies navigating among Noisy ORCA pedestrians as pedestrian unpredictability increases across 500 trials. (Left) Ablation study of our uncertainty-aware policy. Performance improves as uncertainty is integrated by successively modifying the Training process, the Model architecture, and the Reward function. (Right) Comparison of our uncertainty-aware policy against standard SARL policies with a variety of fixed discomfort distance parameters.
Figure 5: Example of each category of scenarios used in simulation experiments. Arrows indicate state and goal positions for the robot (black) and pedestrians (gray).
...and 4 more figures

Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-based Social Robot Navigation

TL;DR

Abstract

Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-based Social Robot Navigation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)