Table of Contents
Fetching ...

Streaming Real-Time Trajectory Prediction Using Endpoint-Aware Modeling

Alexander Prutsch, David Schinagl, Horst Possegger

TL;DR

This work achieves state-of-the-art streaming trajectory prediction results on the Argoverse~2 multi-agent and single-agent benchmarks, while requiring substantially fewer resources.

Abstract

Future trajectories of neighboring traffic agents have a significant influence on the path planning and decision-making of autonomous vehicles. While trajectory forecasting is a well-studied field, research mainly focuses on snapshot-based prediction, where each scenario is treated independently of its global temporal context. However, real-world autonomous driving systems need to operate in a continuous setting, requiring real-time processing of data streams with low latency and consistent predictions over successive timesteps. We leverage this continuous setting to propose a lightweight yet highly accurate streaming-based trajectory forecasting approach. We integrate valuable information from previous predictions with a novel endpoint-aware modeling scheme. Our temporal context propagation uses the trajectory endpoints of the previous forecasts as anchors to extract targeted scenario context encodings. Our approach efficiently guides its scene encoder to extract highly relevant context information without needing refinement iterations or segment-wise decoding. Our experiments highlight that our approach effectively relays information across consecutive timesteps. Unlike methods using multi-stage refinement processing, our approach significantly reduces inference latency, making it well-suited for real-world deployment. We achieve state-of-the-art streaming trajectory prediction results on the Argoverse~2 multi-agent and single-agent benchmarks, while requiring substantially fewer resources.

Streaming Real-Time Trajectory Prediction Using Endpoint-Aware Modeling

TL;DR

This work achieves state-of-the-art streaming trajectory prediction results on the Argoverse~2 multi-agent and single-agent benchmarks, while requiring substantially fewer resources.

Abstract

Future trajectories of neighboring traffic agents have a significant influence on the path planning and decision-making of autonomous vehicles. While trajectory forecasting is a well-studied field, research mainly focuses on snapshot-based prediction, where each scenario is treated independently of its global temporal context. However, real-world autonomous driving systems need to operate in a continuous setting, requiring real-time processing of data streams with low latency and consistent predictions over successive timesteps. We leverage this continuous setting to propose a lightweight yet highly accurate streaming-based trajectory forecasting approach. We integrate valuable information from previous predictions with a novel endpoint-aware modeling scheme. Our temporal context propagation uses the trajectory endpoints of the previous forecasts as anchors to extract targeted scenario context encodings. Our approach efficiently guides its scene encoder to extract highly relevant context information without needing refinement iterations or segment-wise decoding. Our experiments highlight that our approach effectively relays information across consecutive timesteps. Unlike methods using multi-stage refinement processing, our approach significantly reduces inference latency, making it well-suited for real-world deployment. We achieve state-of-the-art streaming trajectory prediction results on the Argoverse~2 multi-agent and single-agent benchmarks, while requiring substantially fewer resources.
Paper Structure (42 sections, 10 figures, 11 tables)

This paper contains 42 sections, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Traffic scenes are constantly changing, which is often neglected in trajectory prediction methods operating on snapshots rather than in a continuous setting. For example, for the trajectory prediction of the turning process of the focal agent (green), a possible interaction with a pedestrian (red) near the endpoint of the previous estimation (at time $t-1$) is important context that needs to be considered in the current prediction ($t$). We model this temporal context propagation with a novel endpoint streaming mechanism, achieving accurate predictions at minimal latencies.
  • Figure 2: Overview of our streaming-based trajectory prediction architecture SEAM. We assume that for frame $t-1$ no previous prediction exists for our focal agent. Thus, only a standard trajectory prediction model pass is executed. In the next frame $t$, we incorporate the endpoints from the previous predictions to aggregate target-centric context information. We encode them using a second encoder path and provide it as additional input to our novel dual-context decoder.
  • Figure 3: Architecture of our novel dual-context attention decoder, leveraging both agent-centric $S^t$ and target-centric $T^t$ features.
  • Figure 4: Qualitative results on two Argoverse 2 scenarios. We show the predictions of our streaming-based method at $t \in \{3, 4, 5\}$s. The visualizations also show ground truth future, agent histories, and neighboring agents. The right column shows the final predictions at $t = 5$s for RealMotion song2024realmotion in the streaming-based setting. Top row: the focal agent is approaching an intersection where other traffic is currently passing by, making it difficult to identify possible movement. Bottom row: a pedestrian crosses the street at an intersection to the right. Our approach correctly predicts that the vehicle can either continue straight or turn right, either waiting before the crosswalk or proceeding directly if the pedestrian has already left the crosswalk.
  • Figure 5: Comparison between snapshot-based and streaming-based trajectory prediction paradigms. Both approaches operate on the same benchmark input data, ensuring a fair comparison without additional data. In the streaming paradigm, past observations are processed using a sliding window, closely resembling practical deployment conditions. To model the challenges of real-world operation information relay mechanisms are established in streaming processing. This enables more consistent and temporally coherent predictions compared to snapshot-based processing, which handles each frame independently. In this example, predictions from the third pass of the streaming model can be directly evaluated against those from the snapshot-based approach.
  • ...and 5 more figures