Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction
M. Eren Akbiyik, Nedko Savov, Danda Pani Paudel, Nikola Popovic, Christian Vater, Otmar Hilliges, Luc Van Gool, Xi Wang
TL;DR
This work tackles ego-trajectory prediction by integrating the driver's field-of-view with the surrounding environment. It introduces RouteFormer, a multimodal network that fuses past motion, scene data, and driver gaze to forecast future ego-motion, aided by a future-discounted loss and auxiliary supervision. A new Path Complexity Index (PCI) quantifies scenario difficulty, and the GEM dataset provides synchronized gaze, FOV, and GPS data in urban settings to evaluate human-centric prediction models. Empirical results on GEM and DR(eye)VE show RouteFormer surpassing state-of-the-art methods, with substantial gains when incorporating driver FOV, especially in complex, high-PCI situations. The work establishes a human-centric benchmark and demonstrates practical potential for safer driver-assistance systems.
Abstract
Understanding drivers' decision-making is crucial for road safety. Although predicting the ego-vehicle's path is valuable for driver-assistance systems, existing methods mainly focus on external factors like other vehicles' motions, often neglecting the driver's attention and intent. To address this gap, we infer the ego-trajectory by integrating the driver's gaze and the surrounding scene. We introduce RouteFormer, a novel multimodal ego-trajectory prediction network combining GPS data, environmental context, and the driver's field-of-view, comprising first-person video and gaze fixations. We also present the Path Complexity Index (PCI), a new metric for trajectory complexity that enables a more nuanced evaluation of challenging scenarios. To tackle data scarcity and enhance diversity, we introduce GEM, a comprehensive dataset of urban driving scenarios enriched with synchronized driver field-of-view and gaze data. Extensive evaluations on GEM and DR(eye)VE demonstrate that RouteFormer significantly outperforms state-of-the-art methods, achieving notable improvements in prediction accuracy across diverse conditions. Ablation studies reveal that incorporating driver field-of-view data yields significantly better average displacement error, especially in challenging scenarios with high PCI scores, underscoring the importance of modeling driver attention. All data and code are available at https://meakbiyik.github.io/routeformer.
