SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation
Ziyi Chen, Yingnan Guo, Zedong Chu, Minghua Luo, Yanfen Shen, Mingchao Sun, Junjun Hu, Shichao Xie, Kuan Yang, Pei Shi, Zhining Gu, Lu Liu, Honglin Han, Xiaolong Wu, Mu Xu, Yu Zhang
TL;DR
SocialNav tackles the challenge of socially aware embodied navigation by coupling a high-level social-reasoning brain with a low-level flow-based action expert. It introduces the SocNav Dataset (CAD and ETP totaling over 7 million samples) and the SocNav Benchmark (high-fidelity, 9 scenes) to train and evaluate, and proposes SAFE-GRPO, a flow-based RL objective that explicitly rewards social compliance. Through a three-stage training pipeline—pre-training on diverse video, sim, and cognitive data; real-world fine-tuning with the Brain frozen; and norm-aware RL—the approach achieves substantial gains in navigation success and social compliance over SOTA baselines (e.g., +38% SR, +46% DCR/TCR). The results on CityWalker, SocNav Benchmark, and real-world deployments demonstrate practical improvements in safety, normative alignment, and real-time performance, advancing embodied social intelligence for robots.
Abstract
Embodied navigation that adheres to social norms remains an open research challenge. Our \textbf{SocialNav} is a foundational model for socially-aware navigation with a hierarchical "brain-action" architecture, capable of understanding high-level social norms and generating low-level, socially compliant trajectories. To enable such dual capabilities, we construct the SocNav Dataset, a large-scale collection of 7 million samples, comprising (1) a Cognitive Activation Dataset providing social reasoning signals such as chain-of-thought explanations and social traversability prediction, and (2) an Expert Trajectories Pyramid aggregating diverse navigation demonstrations from internet videos, simulated environments, and real-world robots. A multi-stage training pipeline is proposed to gradually inject and refine navigation intelligence: we first inject general navigation skills and social norms understanding into the model via imitation learning, and then refine such skills through a deliberately designed Socially-Aware Flow Exploration GRPO (SAFE-GRPO), the first flow-based reinforcement learning framework for embodied navigation that explicitly rewards socially compliant behaviors. SocialNav achieves +38% success rate and +46% social compliance rate compared to the state-of-the-art method, demonstrating strong gains in both navigation performance and social compliance. Our project page: https://amap-eai.github.io/SocialNav/
