Table of Contents
Fetching ...

Controllable Generative Trajectory Prediction via Weak Preference Alignment

Yongxi Cao, Julian F. Schumann, Jens Kober, Joni Pajarinen, Arkady Zgonnikov

TL;DR

PrefCVAE introduces a weak preference alignment mechanism to a conditional VAE, enabling controllable trajectory prediction by embedding semantic attributes into the latent space without degrading predictive accuracy. By sampling paired latent values and using a differentiable preference loss tied to a trajectory utility metric, the model learns a monotonic, semantically meaningful latent space that can steer predictions (e.g., average velocity). Evaluations on nuScenes with a Beta-augmented AgentFormer show that latent control is achievable, the encoder better recovers the latent attributes, and the approach remains competitive on standard accuracy metrics. This work offers a cost-effective method to integrate semantic planning cues into generative models for safer, more informed autonomous driving decisions.

Abstract

Deep generative models such as conditional variational autoencoders (CVAEs) have shown great promise for predicting trajectories of surrounding agents in autonomous vehicle planning. State-of-the-art models have achieved remarkable accuracy in such prediction tasks. Besides accuracy, diversity is also crucial for safe planning because human behaviors are inherently uncertain and multimodal. However, existing methods generally lack a scheme to generate controllably diverse trajectories, which is arguably more useful than randomly diversified trajectories, to the end of safe planning. To address this, we propose PrefCVAE, an augmented CVAE framework that uses weakly labeled preference pairs to imbue latent variables with semantic attributes. Using average velocity as an example attribute, we demonstrate that PrefCVAE enables controllable, semantically meaningful predictions without degrading baseline accuracy. Our results show the effectiveness of preference supervision as a cost-effective way to enhance sampling-based generative models.

Controllable Generative Trajectory Prediction via Weak Preference Alignment

TL;DR

PrefCVAE introduces a weak preference alignment mechanism to a conditional VAE, enabling controllable trajectory prediction by embedding semantic attributes into the latent space without degrading predictive accuracy. By sampling paired latent values and using a differentiable preference loss tied to a trajectory utility metric, the model learns a monotonic, semantically meaningful latent space that can steer predictions (e.g., average velocity). Evaluations on nuScenes with a Beta-augmented AgentFormer show that latent control is achievable, the encoder better recovers the latent attributes, and the approach remains competitive on standard accuracy metrics. This work offers a cost-effective method to integrate semantic planning cues into generative models for safer, more informed autonomous driving decisions.

Abstract

Deep generative models such as conditional variational autoencoders (CVAEs) have shown great promise for predicting trajectories of surrounding agents in autonomous vehicle planning. State-of-the-art models have achieved remarkable accuracy in such prediction tasks. Besides accuracy, diversity is also crucial for safe planning because human behaviors are inherently uncertain and multimodal. However, existing methods generally lack a scheme to generate controllably diverse trajectories, which is arguably more useful than randomly diversified trajectories, to the end of safe planning. To address this, we propose PrefCVAE, an augmented CVAE framework that uses weakly labeled preference pairs to imbue latent variables with semantic attributes. Using average velocity as an example attribute, we demonstrate that PrefCVAE enables controllable, semantically meaningful predictions without degrading baseline accuracy. Our results show the effectiveness of preference supervision as a cost-effective way to enhance sampling-based generative models.

Paper Structure

This paper contains 29 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Motivation of controllable prediction: Most-likely prediction is not always the most accurate one. To account for multimodal futures, controllable prediction should reason about the interaction semantically and predict correspondingly.
  • Figure 2: Requirements of controllable prediction: (1) Encoder enforces the latent distribution to estimate a predefined semantic factor $m(\cdot)$; (2) Latent assignments generate semantically controllable predictions (sampling line colors correspond to trajectories on the right).
  • Figure 3: (a). Controllable predictions, manifesting utility of PrefCVAE decoder. For each column, Up: prediction visualization; Middle: semantic metric w.r.t. $z$ value (horizontal dashed lines are ground truth values); Below: ADE (solid)/FDE (dashed). PrefCVAE can control the prediction: For model trained with PrefCVAE, larger $z$ value always leads to larger average velocity, as learned with the preference loss. Also noticeably, the best accuracy occurs around the latent values that pertain to the ground truth velocity (the dashed horizontal lines). (b). Test-set-regressed distributions of controlled latent factor. Each color pertains to trajectory predicted with a different ${\mathbf{z}}$ value. The ideal result should resemble 9 Dirac delta distributions with modes at each ground truth ${\mathbf{z}}$.
  • Figure 4: Violation rate and accuracy with respect to different use rates (Solid: Averaged; Dashed: Best-of-all).
  • Figure 5: Cross configuration tests with latent dimension of 32.