SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving

Lu Zhang; Peiliang Li; Sikang Liu; Shaojie Shen

SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving

Lu Zhang, Peiliang Li, Sikang Liu, Shaojie Shen

TL;DR

SIMPL introduces a simple and efficient baseline for multi-agent motion prediction in autonomous driving. It combines an instance-centric scene representation with a compact symmetric fusion Transformer to enable single-pass, real-time predictions for all road users. It uses Bernstein Bézier curves to parameterize future trajectories, providing smooth, differentiable states and derivatives for planning. On Argoverse 1 and 2 benchmarks, SIMPL achieves competitive accuracy with far fewer parameters and lower latency, and its design supports straightforward extensibility and onboard deployment.

Abstract

This paper presents a Simple and effIcient Motion Prediction baseLine (SIMPL) for autonomous vehicles. Unlike conventional agent-centric methods with high accuracy but repetitive computations and scene-centric methods with compromised accuracy and generalizability, SIMPL delivers real-time, accurate motion predictions for all relevant traffic participants. To achieve improvements in both accuracy and inference speed, we propose a compact and efficient global feature fusion module that performs directed message passing in a symmetric manner, enabling the network to forecast future motion for all road users in a single feed-forward pass and mitigating accuracy loss caused by viewpoint shifting. Additionally, we investigate the continuous trajectory parameterization using Bernstein basis polynomials in trajectory decoding, allowing evaluations of states and their higher-order derivatives at any desired time point, which is valuable for downstream planning tasks. As a strong baseline, SIMPL exhibits highly competitive performance on Argoverse 1 & 2 motion forecasting benchmarks compared with other state-of-the-art methods. Furthermore, its lightweight design and low inference latency make SIMPL highly extensible and promising for real-world onboard deployment. We open-source the code at https://github.com/HKUST-Aerial-Robotics/SIMPL.

SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving

TL;DR

Abstract

Paper Structure (28 sections, 11 equations, 8 figures, 5 tables)

This paper contains 28 sections, 11 equations, 8 figures, 5 tables.

Introduction
Related Work
Context Encoding and Fusion
Symmetric Scene Modeling
Trajectory Representation
Methodology
Problem Formulation
Framework Overview
Instance-centric Scene Representation
Context Feature Encoding
Symmetric Fusion Transformer
Multimodal Continuous Trajectory Decoder
Training
Experimental Results
Experiment Setup
...and 13 more sections

Figures (8)

Figure 1: Illustration of multi-agent motion prediction in complex driving scenarios. Our method is able to generate reasonable hypotheses for all relevant agents simultaneously in a real-time fashion. The ego and other vehicles are shown in red and blue, respectively. Predicted trajectories are visualized using gradient color according to the timestamps. Please refer to the attached video for more examples.
Figure 2: Illustration of SIMPL. We utilize the simplest possible network architecture to demonstrate its effectiveness. The local features of semantic instances are processed by simple encoders, while the inter-instance features are preserved in the relative positional embeddings. Multimodal trajectory prediction results are generated by the motion decoder after the proposed symmetric feature Transformer.
Figure 3: Illustration of the relative pose calculation. A typical scene is depicted on the left, and we leave out the $y$-axis of the anchor poses for conciseness. The relative pose between instance $i$ and $j$ can be described by the heading difference $\alpha_{i\rightarrow j}$, relative azimuth $\beta_{i\rightarrow j}$, and positional distance $\lVert\mathbf{d}_{i\rightarrow j}\rVert$. The all-to-all relative poses are calculated and formulated as a 3D array.
Figure 4: Illustration of the proposed symmetric fusion Transformer (SFT) with $L$ layers. Instance tokens and RPE are recurrently updated in each SFT layer.
Figure 5: A 2D septic Bézier curve (left). Pink dots are control points while grey polygons are corresponding convex hulls. When the time duration of the trajectory is 1 second, the $1$st-order derivative will exactly be the velocity profile (right), which is also a Bézier curve due to the hodograph property.
...and 3 more figures

SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving

TL;DR

Abstract

SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving

Authors

TL;DR

Abstract

Table of Contents

Figures (8)