OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising

Haichao Zhang; Yi Xu; Hongsheng Lu; Takayuki Shimizu; Yun Fu

OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising

Haichao Zhang, Yi Xu, Hongsheng Lu, Takayuki Shimizu, Yun Fu

TL;DR

This work tackles out-of-sight trajectory prediction by introducing OOSTraj, a vision-positioning denoising framework that denoises noisy sensor trajectories and maps them into the visual domain for predicting future out-of-sight trajectories. The method combines a Mobile Denoising Encoder (MDE), Visual-Positioning Denoising Module (VPD), Visual Positioning Projection (VPP), Camera Parameters Estimator (CPE), and an Out-of-Sight Prediction Decoder (OPD), with a denoising loss guiding unsupervised learning through available visual cues. It achieves state-of-the-art results on Vi-Fi and JRDB datasets, and plug-and-play experiments show that adding VPD improves baselines in both denoising and future trajectory prediction. The approach enables safer and more reliable autonomous driving in complex environments by effectively handling non-visible objects and sensor noise, and code is publicly available.

Abstract

Trajectory prediction is fundamental in computer vision and autonomous driving, particularly for understanding pedestrian behavior and enabling proactive decision-making. Existing approaches in this field often assume precise and complete observational data, neglecting the challenges associated with out-of-view objects and the noise inherent in sensor data due to limited camera range, physical obstructions, and the absence of ground truth for denoised sensor data. Such oversights are critical safety concerns, as they can result in missing essential, non-visible objects. To bridge this gap, we present a novel method for out-of-sight trajectory prediction that leverages a vision-positioning technique. Our approach denoises noisy sensor observations in an unsupervised manner and precisely maps sensor-based trajectories of out-of-sight objects into visual trajectories. This method has demonstrated state-of-the-art performance in out-of-sight noisy sensor trajectory denoising and prediction on the Vi-Fi and JRDB datasets. By enhancing trajectory prediction accuracy and addressing the challenges of out-of-sight objects, our work significantly contributes to improving the safety and reliability of autonomous driving in complex environments. Our work represents the first initiative towards Out-Of-Sight Trajectory prediction (OOSTraj), setting a new benchmark for future research. The code is available at \url{https://github.com/Hai-chao-Zhang/OOSTraj}.

OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising

TL;DR

Abstract

Paper Structure (25 sections, 10 equations, 2 figures, 3 tables)

This paper contains 25 sections, 10 equations, 2 figures, 3 tables.

Introduction
Related Works
Vision-Wireless Fusion
Obstructed Trajectory Prediction
Problem Definition
Symbol Annotations
Task Definition
Methodology
Mobile Denoising Encoder (MDE)
Visual-Positioning Denoising Module (VPD)
Visual Positioning Projection Module (VPP)
Camera Parameters Estimator (CPE)
Denoising Loss
Out-of-Sight Prediction Decoder (OPD)
Implementation Details
...and 10 more sections

Figures (2)

Figure 1: A representative illustration of real-world out-of-sight scenarios in autonomous driving. The autonomous vehicle is equipped with a camera (capturing precise visual trajectories, indicated by green dotted arrows) and a mobile signal receiver (capturing noisy sensor trajectories, represented by red dotted arrows) for tracking pedestrians and other vehicles. Pedestrians P1 and P2 are within the camera's field of view, while P3 is entirely out of sight and P4 is obscured by other vehicles. Consequently, P3 and P4 lack captured visual trajectories and are positioned dangerously, potentially crossing into the vehicle's path, posing a risk of collision. The black dotted arrows depict the hypothesized noise-free real trajectories, ideally captured by mobile sensors, contrasting with the actual noisy sensor trajectories (red arrows). The gray area in the figure demarcates the visibility range of the mobile and visual modalities: white indicates no data captured, orange signifies the presence of visual trajectories, and blue represents the availability of mobile trajectories.
Figure 2: Overview of the Vision-Positioning Denoising and Predicting Model architecture. This illustration highlights the processing of pedestrian data, where pedestrians P3 and P4 are detectable only by mobile receivers, while P1 and P2 are visible to both camera and mobile receivers. The Camera Parameters Estimator Module utilizes the dual-modality trajectories of in-view pedestrians (like P1 and P2) to analyze the relationship between camera and world coordinates, resulting in a camera matrix embedding. For out-of-sight pedestrians (e.g., P3, P4), their noisy mobile trajectories are first refined by the Mobile Denoising Encoder, producing a denoised signal embedding. This embedding is then merged with the matrix embedding in the Visual Positioning Projection Module, facilitating the mapping of data into camera coordinates, with the application of $\mathcal{L}_Denoise$. Finally, the Out-of-Sight Prediction Decoder leverages the denoised visual signals to predict the trajectories of pedestrians not captured by the camera.

OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising

TL;DR

Abstract

OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising

Authors

TL;DR

Abstract

Table of Contents

Figures (2)