From Obstacles to Etiquette: Robot Social Navigation with VLM-Informed Path Selection

Zilin Fang; Anxing Xiao; David Hsu; Gim Hee Lee

From Obstacles to Etiquette: Robot Social Navigation with VLM-Informed Path Selection

Zilin Fang, Anxing Xiao, David Hsu, Gim Hee Lee

TL;DR

The paper proposes a social robot navigation framework that blends geometric path planning with context-aware social reasoning using a task-specific Vision-Language Model. It samples geometry-feasible paths and uses a fine-tuned VLM, distilling the reasoning into a compact model (Qwen-2.5 7B) for real-time path selection, within a receding-horizon loop that feeds back to a local ORCA-based controller. Experiments on a Boston Dynamics Spot platform across four social scenarios show superior performance, achieving collision-free trajectories with minimal social-zone intrusion and low personal-space violations compared with multiple baselines. The work demonstrates that grounding social norms in a VLM, combined with motion prediction and anchors-based planning, yields robust, scalable social navigation in diverse human-centered contexts.

Abstract

Navigating socially in human environments requires more than satisfying geometric constraints, as collision-free paths may still interfere with ongoing activities or conflict with social norms. Addressing this challenge calls for analyzing interactions between agents and incorporating common-sense reasoning into planning. This paper presents a social robot navigation framework that integrates geometric planning with contextual social reasoning. The system first extracts obstacles and human dynamics to generate geometrically feasible candidate paths, then leverages a fine-tuned vision-language model (VLM) to evaluate these paths, informed by contextually grounded social expectations, selecting a socially optimized path for the controller. This task-specific VLM distills social reasoning from large foundation models into a smaller and efficient model, allowing the framework to perform real-time adaptation in diverse human-robot interaction contexts. Experiments in four social navigation contexts demonstrate that our method achieves the best overall performance with the lowest personal space violation duration, the minimal pedestrian-facing time, and no social zone intrusions. Project page: https://path-etiquette.github.io

From Obstacles to Etiquette: Robot Social Navigation with VLM-Informed Path Selection

TL;DR

Abstract

Paper Structure (24 sections, 1 equation, 11 figures, 2 tables)

This paper contains 24 sections, 1 equation, 11 figures, 2 tables.

Introduction
Related Work
Navigation in Crowds
Social Robot Navigation without Foundation Models
Utilizing Foundation Models for Robot Navigation
Overview
Formulation
System Design
Social Navigation with Path Selection
Human Motion Extraction
Prediction-Fused Costmap Generation
Path Planning
Social-compliance Selection
Visual Prompting
Fine-tuning
...and 9 more sections

Figures (11)

Figure 1: Illustration of robot navigation in a scenario with three geometrically feasible sampled paths, where the robot should reason about social conventions to select the most appropriate path.
Figure 2: System overview. Geometric constraints are extracted from human motion and costmap modules using sensor data. Collision-free path candidates are sampled, projected into the image, and evaluated by a fine-tuned VLM. The selection is fed back as reference to retrieve a path for the local controller.
Figure 3: Human Motion Extraction. The module detects and tracks humans using images, then fuses depth information from LiDAR point clouds with ego-pose from odometry to estimate human states in global coordinates.
Figure 4: The illustration of Prediction-Fused Costmap Generation.
Figure 5: Path Planning. Detoured yet collision-free path candidates are mainly obtained through the use of anchors.
...and 6 more figures

From Obstacles to Etiquette: Robot Social Navigation with VLM-Informed Path Selection

TL;DR

Abstract

From Obstacles to Etiquette: Robot Social Navigation with VLM-Informed Path Selection

Authors

TL;DR

Abstract

Table of Contents

Figures (11)