Table of Contents
Fetching ...

SocialNav-Map: Dynamic Mapping with Human Trajectory Prediction for Zero-Shot Social Navigation

Lingfeng Zhang, Erjia Xiao, Xiaoshuai Hao, Haoxiang Fu, Zeying Gong, Long Chen, Xiaojun Liang, Renjing Xu, Hangjun Ye, Wenbo Ding

TL;DR

This work tackles zero-shot social navigation in dynamic human environments by removing the need for environment-specific training. It introduces SocialNav-Map, a framework that constructs a dynamic occupancy map and fuses dual human trajectory predictions—history-based and orientation-based—into proactive obstacle planning, using an efficient Fast Marching Method for path planning. Across Social-HM3D and Social-MP3D, SocialNav-Map matches or exceeds state-of-the-art RL methods that require thousands of GPU hours, while reducing human collision rates by over 10% and avoiding fine-tuning in new settings. The approach offers practical deployment benefits for real-world indoor navigation, with open-source code provided for reproducibility and extension.

Abstract

Social navigation in densely populated dynamic environments poses a significant challenge for autonomous mobile robots, requiring advanced strategies for safe interaction. Existing reinforcement learning (RL)-based methods require over 2000+ hours of extensive training and often struggle to generalize to unfamiliar environments without additional fine-tuning, limiting their practical application in real-world scenarios. To address these limitations, we propose SocialNav-Map, a novel zero-shot social navigation framework that combines dynamic human trajectory prediction with occupancy mapping, enabling safe and efficient navigation without the need for environment-specific training. Specifically, SocialNav-Map first transforms the task goal position into the constructed map coordinate system. Subsequently, it creates a dynamic occupancy map that incorporates predicted human movements as dynamic obstacles. The framework employs two complementary methods for human trajectory prediction: history prediction and orientation prediction. By integrating these predicted trajectories into the occupancy map, the robot can proactively avoid potential collisions with humans while efficiently navigating to its destination. Extensive experiments on the Social-HM3D and Social-MP3D datasets demonstrate that SocialNav-Map significantly outperforms state-of-the-art (SOTA) RL-based methods, which require 2,396 GPU hours of training. Notably, it reduces human collision rates by over 10% without necessitating any training in novel environments. By eliminating the need for environment-specific training, SocialNav-Map achieves superior navigation performance, paving the way for the deployment of social navigation systems in real-world environments characterized by diverse human behaviors. The code is available at: https://github.com/linglingxiansen/SocialNav-Map.

SocialNav-Map: Dynamic Mapping with Human Trajectory Prediction for Zero-Shot Social Navigation

TL;DR

This work tackles zero-shot social navigation in dynamic human environments by removing the need for environment-specific training. It introduces SocialNav-Map, a framework that constructs a dynamic occupancy map and fuses dual human trajectory predictions—history-based and orientation-based—into proactive obstacle planning, using an efficient Fast Marching Method for path planning. Across Social-HM3D and Social-MP3D, SocialNav-Map matches or exceeds state-of-the-art RL methods that require thousands of GPU hours, while reducing human collision rates by over 10% and avoiding fine-tuning in new settings. The approach offers practical deployment benefits for real-world indoor navigation, with open-source code provided for reproducibility and extension.

Abstract

Social navigation in densely populated dynamic environments poses a significant challenge for autonomous mobile robots, requiring advanced strategies for safe interaction. Existing reinforcement learning (RL)-based methods require over 2000+ hours of extensive training and often struggle to generalize to unfamiliar environments without additional fine-tuning, limiting their practical application in real-world scenarios. To address these limitations, we propose SocialNav-Map, a novel zero-shot social navigation framework that combines dynamic human trajectory prediction with occupancy mapping, enabling safe and efficient navigation without the need for environment-specific training. Specifically, SocialNav-Map first transforms the task goal position into the constructed map coordinate system. Subsequently, it creates a dynamic occupancy map that incorporates predicted human movements as dynamic obstacles. The framework employs two complementary methods for human trajectory prediction: history prediction and orientation prediction. By integrating these predicted trajectories into the occupancy map, the robot can proactively avoid potential collisions with humans while efficiently navigating to its destination. Extensive experiments on the Social-HM3D and Social-MP3D datasets demonstrate that SocialNav-Map significantly outperforms state-of-the-art (SOTA) RL-based methods, which require 2,396 GPU hours of training. Notably, it reduces human collision rates by over 10% without necessitating any training in novel environments. By eliminating the need for environment-specific training, SocialNav-Map achieves superior navigation performance, paving the way for the deployment of social navigation systems in real-world environments characterized by diverse human behaviors. The code is available at: https://github.com/linglingxiansen/SocialNav-Map.

Paper Structure

This paper contains 20 sections, 17 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Comparison of reinforcement learning-based social navigation methods and our SocialNav-Map15, 200, 2000, 0, 255 framework. Traditional reinforcement learning (RL)-based methods necessitate extensive training—totaling 2,396 GPU hours—through trial-and-error in simulated environments prior to deployment. In contrast, we present SocialNav-Map15, 200, 2000, 0, 255, a zero-shot social navigation framework that tackles social navigation tasks by constructing dynamic occupancy maps and predicting human trajectories. Notably, SocialNav-Map15, 200, 2000, 0, 255 achieves superior performance compared to state-of-the-art RL-based methods, all without the need for any training.
  • Figure 2: The general pipeline of our SocialNav-Map15, 200, 2000, 0, 255 framework. The system first constructs a static occupancy map from observation data through 3D point cloud generation and voxel discretization, then converts target locations into map coordinates. The core of SocialNav-Map15, 200, 2000, 0, 255 employs dual trajectory prediction—history prediction analyzes past human motion patterns, while direction prediction infers future positions—and fuses these predictions with dynamic obstacles with time decay. Finally, a fast marching method generates optimal collision-free paths and converts them into discrete navigation actions for zero-shot social navigation.
  • Figure 3: Human trajectory prediction and dynamic obstacle management in SocialNav-Map15, 200, 2000, 0, 255. History prediction analyzes past human motion and uses linear regression fitting to infer future trajectories. Orientation prediction uses the current human pose and orientation to predict future positions via linear extension. These two prediction methods are fused to generate comprehensive trajectory predictions, which are integrated into the occupancy map as dynamic obstacles. These predicted human obstacles are automatically removed from the map after a certain decay period, preventing outdated predictions from hindering robot navigation while ensuring safety during active human movement.
  • Figure 4: Qualitative results of our SocialNav-Map15, 200, 2000, 0, 255.
  • Figure 5: Hyperparameter Ablation Study on Social-HM3D.