Table of Contents
Fetching ...

SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation

Ziyi Chen, Yingnan Guo, Zedong Chu, Minghua Luo, Yanfen Shen, Mingchao Sun, Junjun Hu, Shichao Xie, Kuan Yang, Pei Shi, Zhining Gu, Lu Liu, Honglin Han, Xiaolong Wu, Mu Xu, Yu Zhang

TL;DR

SocialNav tackles the challenge of socially aware embodied navigation by coupling a high-level social-reasoning brain with a low-level flow-based action expert. It introduces the SocNav Dataset (CAD and ETP totaling over 7 million samples) and the SocNav Benchmark (high-fidelity, 9 scenes) to train and evaluate, and proposes SAFE-GRPO, a flow-based RL objective that explicitly rewards social compliance. Through a three-stage training pipeline—pre-training on diverse video, sim, and cognitive data; real-world fine-tuning with the Brain frozen; and norm-aware RL—the approach achieves substantial gains in navigation success and social compliance over SOTA baselines (e.g., +38% SR, +46% DCR/TCR). The results on CityWalker, SocNav Benchmark, and real-world deployments demonstrate practical improvements in safety, normative alignment, and real-time performance, advancing embodied social intelligence for robots.

Abstract

Embodied navigation that adheres to social norms remains an open research challenge. Our \textbf{SocialNav} is a foundational model for socially-aware navigation with a hierarchical "brain-action" architecture, capable of understanding high-level social norms and generating low-level, socially compliant trajectories. To enable such dual capabilities, we construct the SocNav Dataset, a large-scale collection of 7 million samples, comprising (1) a Cognitive Activation Dataset providing social reasoning signals such as chain-of-thought explanations and social traversability prediction, and (2) an Expert Trajectories Pyramid aggregating diverse navigation demonstrations from internet videos, simulated environments, and real-world robots. A multi-stage training pipeline is proposed to gradually inject and refine navigation intelligence: we first inject general navigation skills and social norms understanding into the model via imitation learning, and then refine such skills through a deliberately designed Socially-Aware Flow Exploration GRPO (SAFE-GRPO), the first flow-based reinforcement learning framework for embodied navigation that explicitly rewards socially compliant behaviors. SocialNav achieves +38% success rate and +46% social compliance rate compared to the state-of-the-art method, demonstrating strong gains in both navigation performance and social compliance. Our project page: https://amap-eai.github.io/SocialNav/

SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation

TL;DR

SocialNav tackles the challenge of socially aware embodied navigation by coupling a high-level social-reasoning brain with a low-level flow-based action expert. It introduces the SocNav Dataset (CAD and ETP totaling over 7 million samples) and the SocNav Benchmark (high-fidelity, 9 scenes) to train and evaluate, and proposes SAFE-GRPO, a flow-based RL objective that explicitly rewards social compliance. Through a three-stage training pipeline—pre-training on diverse video, sim, and cognitive data; real-world fine-tuning with the Brain frozen; and norm-aware RL—the approach achieves substantial gains in navigation success and social compliance over SOTA baselines (e.g., +38% SR, +46% DCR/TCR). The results on CityWalker, SocNav Benchmark, and real-world deployments demonstrate practical improvements in safety, normative alignment, and real-time performance, advancing embodied social intelligence for robots.

Abstract

Embodied navigation that adheres to social norms remains an open research challenge. Our \textbf{SocialNav} is a foundational model for socially-aware navigation with a hierarchical "brain-action" architecture, capable of understanding high-level social norms and generating low-level, socially compliant trajectories. To enable such dual capabilities, we construct the SocNav Dataset, a large-scale collection of 7 million samples, comprising (1) a Cognitive Activation Dataset providing social reasoning signals such as chain-of-thought explanations and social traversability prediction, and (2) an Expert Trajectories Pyramid aggregating diverse navigation demonstrations from internet videos, simulated environments, and real-world robots. A multi-stage training pipeline is proposed to gradually inject and refine navigation intelligence: we first inject general navigation skills and social norms understanding into the model via imitation learning, and then refine such skills through a deliberately designed Socially-Aware Flow Exploration GRPO (SAFE-GRPO), the first flow-based reinforcement learning framework for embodied navigation that explicitly rewards socially compliant behaviors. SocialNav achieves +38% success rate and +46% social compliance rate compared to the state-of-the-art method, demonstrating strong gains in both navigation performance and social compliance. Our project page: https://amap-eai.github.io/SocialNav/

Paper Structure

This paper contains 46 sections, 11 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Socially-Aware Navigation in Real-World Environments. SocialNav combines high-level semantic reasoning with low-level trajectory generation. It identifies socially traversable zones and generates CoT explanations, planning routes that respect social norms.
  • Figure 2: Overview of the SocNav Dataset and Benchmark. The SocNav Dataset (left) illustrates the hierarchical structure and data construction pipeline, composed of the Expert Trajectories Pyramid (ETP) and Cognitive Activation Dataset (CAD). The SocNav Benchmark (right) is a high-fidelity evaluation platform featuring large-scale, diverse social environments and offering comprehensive metrics to assess socially-aware navigation performance.
  • Figure 3: SocialNav Architecture and Training Pipeline. SocialNav adopts a hierarchical architecture, with a VLM-based Brain for high-level semantic reasoning and an action expert for generating socially compliant trajectories. We adopt a three-stage training strategy: Pre-training, Fine-tuning, and SAFE-GRPO.
  • Figure 4: Qualitative comparison on the SocNav Benchmark. We visualize representative trajectories in three scenes (Crossing, Park, Campus). The left column shows top-down path views with our method (green) and the CityWalker baseline (red), where warning signs mark unsafe or socially improper behaviors. The right columns depict corresponding egocentric views: SocialNav remains on sidewalks and walkways, while the baseline often takes shorter but socially risky routes through restricted regions (such as driveways, dry streambeds, lawns, and green belts) or crashes into obstacles like glass walls and trees.
  • Figure 5: Standard and recovery trajectories in $D_\text{sim}$. Visual examples from SocCity scenes. Green curves denote standard expert trajectories obtained by A* planning on the navigation graph, while the other curves depict locally sampled recovery trajectories originating from intermediate points. The background shows the semantic occupancy map $\mathcal{M}_\text{occ}$ with walkable regions (white) and non-traversable regions (black).
  • ...and 4 more figures