Table of Contents
Fetching ...

LiMTR: Time Series Motion Prediction for Diverse Road Users through Multimodal Feature Integration

Camiel Oerlemans, Bram Grooten, Michiel Braat, Alaa Alassi, Emilia Silvas, Decebal Constantin Mocanu

TL;DR

A novel multimodal approach for motion prediction based on the PointNet foundation model architecture, incorporating local LiDAR features is developed, which shows a performance improvement when integrated and compared with the previous state-of-the-art MTR.

Abstract

Predicting the behavior of road users accurately is crucial to enable the safe operation of autonomous vehicles in urban or densely populated areas. Therefore, there has been a growing interest in time series motion prediction research, leading to significant advancements in state-of-the-art techniques in recent years. However, the potential of using LiDAR data to capture more detailed local features, such as a person's gaze or posture, remains largely unexplored. To address this, we develop a novel multimodal approach for motion prediction based on the PointNet foundation model architecture, incorporating local LiDAR features. Evaluation on the Waymo Open Dataset shows a performance improvement of 6.20% and 1.58% in minADE and mAP respectively, when integrated and compared with the previous state-of-the-art MTR. We open-source the code of our LiMTR model.

LiMTR: Time Series Motion Prediction for Diverse Road Users through Multimodal Feature Integration

TL;DR

A novel multimodal approach for motion prediction based on the PointNet foundation model architecture, incorporating local LiDAR features is developed, which shows a performance improvement when integrated and compared with the previous state-of-the-art MTR.

Abstract

Predicting the behavior of road users accurately is crucial to enable the safe operation of autonomous vehicles in urban or densely populated areas. Therefore, there has been a growing interest in time series motion prediction research, leading to significant advancements in state-of-the-art techniques in recent years. However, the potential of using LiDAR data to capture more detailed local features, such as a person's gaze or posture, remains largely unexplored. To address this, we develop a novel multimodal approach for motion prediction based on the PointNet foundation model architecture, incorporating local LiDAR features. Evaluation on the Waymo Open Dataset shows a performance improvement of 6.20% and 1.58% in minADE and mAP respectively, when integrated and compared with the previous state-of-the-art MTR. We open-source the code of our LiMTR model.

Paper Structure

This paper contains 26 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Our LiDAR encoder. Point compression in green, time in orange, with the final feature in blue. The variables represent timesteps (T), points (N), and feature dimension per point (D).
  • Figure 2: LiMTR architecture based on the MTR shi2022motion model, with our LiDAR encoder in green.
  • Figure 3: Scaling of the LiDAR encoder network, showing the number of parameters to model performance in minADE, including a horizontal line denoting baseline MTR performance without LiDAR, and a fitted exponential function.
  • Figure 4: (Left) Performance of LiMTR (ours) and MTR. (Right) Local LiDAR data of a cyclist and a pedestrian in a scene in blue. Our LiMTR model receives time series LiDAR data at 10Hz, capturing the movement of such point clouds per target object.