HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

Shuijing Liu; Haochen Xia; Fatemeh Cheraghi Pouria; Kaiwen Hong; Neeloy Chakraborty; Zichao Hu; Joydeep Biswas; Katherine Driggs-Campbell

HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

Shuijing Liu, Haochen Xia, Fatemeh Cheraghi Pouria, Kaiwen Hong, Neeloy Chakraborty, Zichao Hu, Joydeep Biswas, Katherine Driggs-Campbell

TL;DR

This work tackles robot navigation in crowded and constrained indoor environments by introducing HEIGHT, a structured policy built on a heterogeneous spatio-temporal graph. By splitting scene inputs into human dynamics and obstacle geometry and applying separate attention mechanisms for robot-human and human-human interactions, HEIGHT achieves robust long-horizon reasoning and adaptive collision avoidance. Extensive simulations and real-world deployments demonstrate superior performance over baselines in success, time efficiency, and generalization to distribution shifts, with notable sim2real transfer advantages. The approach highlights the value of explicit scene structure and edge-type specialization for multi-agent navigation in complex environments.

Abstract

We study the problem of robot navigation in dense and interactive crowds with static constraints such as corridors and furniture. Previous methods fail to consider all types of spatial and temporal interactions among agents and obstacles, leading to unsafe and inefficient robot paths. In this article, we leverage a graph-based representation of crowded and constrained scenarios and propose a structured framework to learn robot navigation policies with deep reinforcement learning. We first split the representations of different inputs and propose a heterogeneous spatio-temporal graph to model distinct interactions among humans, robots, and obstacles. Based on the heterogeneous spatio-temporal graph, we propose HEIGHT, a novel navigation policy network architecture with different components to capture heterogeneous interactions through space and time. HEIGHT utilizes attention mechanisms to prioritize important interactions and a recurrent network to track changes in the dynamic scene over time, encouraging the robot to avoid collisions adaptively. Through extensive simulation and real-world experiments, we demonstrate that HEIGHT outperforms state-of-the-art baselines in terms of success, navigation time, and generalization to domain shifts in challenging navigation scenarios. More information is available at https://sites.google.com/view/crowdnav-height/home.

HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

TL;DR

Abstract

HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)