EgoNav: Egocentric Scene-aware Human Trajectory Prediction
Weizhuo Wang, C. Karen Liu, Monroe Kennedy
TL;DR
This work tackles ego-centric trajectory prediction for wearable robots by conditioning future motion on both the past trajectory and a rich egocentric scene representation. It introduces a diffusion‑based predictor that operates on a compact Visual Memory embedding derived from aligned RGBD and semantic data, enabling multimodal future trajectory sampling at real‑time rates. Key contributions include the Visual Memory representation, a hybrid DDIM–DDPM sampling scheme for fast yet high‑fidelity inference, and a comprehensive egocentric navigation dataset with diverse indoor–outdoor scenarios. The results demonstrate improved collision avoidance and mode coverage over baselines, validating the approach for safer, scene‑aware human–robot collaboration and informing downstream planning and imitation learning tasks.
Abstract
Wearable collaborative robots stand to assist human wearers who need fall prevention assistance or wear exoskeletons. Such a robot needs to be able to constantly adapt to the surrounding scene based on egocentric vision, and predict the ego motion of the wearer. In this work, we leveraged body-mounted cameras and sensors to anticipate the trajectory of human wearers through complex surroundings. To facilitate research in ego-motion prediction, we have collected a comprehensive walking scene navigation dataset centered on the user's perspective. We then present a method to predict human motion conditioning on the surrounding static scene. Our method leverages a diffusion model to produce a distribution of potential future trajectories, taking into account the user's observation of the environment. To that end, we introduce a compact representation to encode the user's visual memory of the surroundings, as well as an efficient sample-generating technique to speed up real-time inference of a diffusion model. We ablate our model and compare it to baselines, and results show that our model outperforms existing methods on key metrics of collision avoidance and trajectory mode coverage.
