HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions

Yifei Dong; Fengyi Wu; Qi He; Zhi-Qi Cheng; Heng Li; Minghan Li; Zebang Cheng; Yuxuan Zhou; Jingdong Sun; Qi Dai; Alexander G Hauptmann

HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions

Yifei Dong, Fengyi Wu, Qi He, Zhi-Qi Cheng, Heng Li, Minghan Li, Zebang Cheng, Yuxuan Zhou, Jingdong Sun, Qi Dai, Alexander G Hauptmann

TL;DR

HA-VLN 2.0 presents a unified benchmark for human-aware Vision-and-Language Navigation that bridges discrete and continuous navigation in dynamic, multi-human environments. It introduces HAPS 2.0, two simulators (HA-VLN-DE/CE), a unified API, and 16,844 socially grounded instructions drawn from HA-R2R, with 910 annotated humans across 428 regions to capture realistic social dynamics. Two baseline agents (HA-VLN-VL and HA-VLN-CMA) and a sim-to-real validation on a real robot demonstrate that explicit social modeling improves robustness and reduces collisions, while an open leaderboard enables transparent comparison. By releasing comprehensive datasets, simulators, baselines, and evaluation protocols, HA-VLN 2.0 provides a robust foundation for safe, socially aware navigation research and real-world deployment.

Abstract

Vision-and-Language Navigation (VLN) has been studied mainly in either discrete or continuous settings, with little attention to dynamic, crowded environments. We present HA-VLN 2.0, a unified benchmark introducing explicit social-awareness constraints. Our contributions are: (i) a standardized task and metrics capturing both goal accuracy and personal-space adherence; (ii) HAPS 2.0 dataset and simulators modeling multi-human interactions, outdoor contexts, and finer language-motion alignment; (iii) benchmarks on 16,844 socially grounded instructions, revealing sharp performance drops of leading agents under human dynamics and partial observability; and (iv) real-world robot experiments validating sim-to-real transfer, with an open leaderboard enabling transparent comparison. Results show that explicit social modeling improves navigation robustness and reduces collisions, underscoring the necessity of human-centric approaches. By releasing datasets, simulators, baselines, and protocols, HA-VLN 2.0 provides a strong foundation for safe, socially responsible navigation research.

HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions

TL;DR

Abstract

HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (20)