Towards Generalizable and Interpretable Motion Prediction: A Deep Variational Bayes Approach

Juanwu Lu; Wei Zhan; Masayoshi Tomizuka; Yeping Hu

Towards Generalizable and Interpretable Motion Prediction: A Deep Variational Bayes Approach

Juanwu Lu, Wei Zhan, Masayoshi Tomizuka, Yeping Hu

TL;DR

The Goal-based Neural Variational Agent is proposed, an interpretable generative model for motion prediction with robust generalizability to out-of-distribution cases and a causal structure among maps and agents' histories and derive a variational posterior to enhance generalizability.

Abstract

Estimating the potential behavior of the surrounding human-driven vehicles is crucial for the safety of autonomous vehicles in a mixed traffic flow. Recent state-of-the-art achieved accurate prediction using deep neural networks. However, these end-to-end models are usually black boxes with weak interpretability and generalizability. This paper proposes the Goal-based Neural Variational Agent (GNeVA), an interpretable generative model for motion prediction with robust generalizability to out-of-distribution cases. For interpretability, the model achieves target-driven motion prediction by estimating the spatial distribution of long-term destinations with a variational mixture of Gaussians. We identify a causal structure among maps and agents' histories and derive a variational posterior to enhance generalizability. Experiments on motion prediction datasets validate that the fitted model can be interpretable and generalizable and can achieve comparable performance to state-of-the-art results.

Towards Generalizable and Interpretable Motion Prediction: A Deep Variational Bayes Approach

TL;DR

Abstract

Paper Structure (39 sections, 2 theorems, 25 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 39 sections, 2 theorems, 25 equations, 6 figures, 6 tables, 1 algorithm.

INTRODUCTION
RELATED WORKS
Generalizability and Interpretability in Motion Prediction Models
Deep Generative Model
METHOD
Problem Statement
Spatial Distribution Model for Goals
Mean Posterior
Precision Posterior
Feature Encoding and Attention Module
Feature Encoding
Attention Module
Proxy $z$-posterior Network
Sampling and Trajectory Completion
Model Training
...and 24 more sections

Key Result

Proposition 1

Goals are multi-modal samples from a mixture of diverse intention distributions, and a single observed goal in the data is a sample from one dominant intention at a specific timestamp.

Figures (6)

Figure 1: Example case illustrating the multi-modal distribution of long-term goal. The target vehicle can maintain its current cruising speed or accelerate inside the roundabout, leading to a spatial distribution with multiple modes.
Figure 2: GNeVA model overview. The input HD map and history trajectories of all observed traffic are first encoded through polyline-based Map and Agent encoders, respectively. Encoded vector features associated with road geometry and agents' histories pass the context attention and interaction attention modules to derive posterior distribution parameters of means and precision. We then evaluate and sample goals from the posterior predictive distribution. Finally, a trajectory network completes the intermediate paths from current positions to sampled goals.
Figure 3: Graphical model for the GNeVA showing the likelihood family (left) and variational family (right).
Figure 4: Visualization of posterior predictive goal distributions under selected in-distribution and out-of-distribution cases. All cases are selected from the INTERACTION test dataset.
Figure 5: Illustrations of the Attention Modules.
...and 1 more figures

Theorems & Definitions (2)

Proposition 1
Proposition 2

Towards Generalizable and Interpretable Motion Prediction: A Deep Variational Bayes Approach

TL;DR

Abstract

Towards Generalizable and Interpretable Motion Prediction: A Deep Variational Bayes Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (2)