Table of Contents
Fetching ...

Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic

Mikael Henaff, Alfredo Canziani, Yann LeCun

TL;DR

The paper tackles covariate shift when learning driving policies from observational data by proposing a model-predictive framework that unrolls a learned action-conditional forward model and penalizes trajectory uncertainty. It introduces MPUR and MPER, which backpropagate through multi-step predictions to optimize a combined policy and uncertainty objective, with uncertainty estimated via dropout in a variational forward model. The approach reframes learning as a Bayesian-like process, separating epistemic and aleatoric uncertainty and enforcing rollout trajectories to stay within the training manifold. Demonstrations on the NGSIM I-80 driving dataset show that uncertainty-regularized, model-based learning can produce effective driving policies without environment interaction, outperforming several baselines. The work provides a scalable, data-efficient path for learning autonomous driving policies from observational data and offers a public dataset and planning environment for further study.

Abstract

Learning a policy using only observational data is challenging because the distribution of states it induces at execution time may differ from the distribution observed during training. We propose to train a policy by unrolling a learned model of the environment dynamics over multiple time steps while explicitly penalizing two costs: the original cost the policy seeks to optimize, and an uncertainty cost which represents its divergence from the states it is trained on. We measure this second cost by using the uncertainty of the dynamics model about its own predictions, using recent ideas from uncertainty estimation for deep networks. We evaluate our approach using a large-scale observational dataset of driving behavior recorded from traffic cameras, and show that we are able to learn effective driving policies from purely observational data, with no environment interaction.

Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic

TL;DR

The paper tackles covariate shift when learning driving policies from observational data by proposing a model-predictive framework that unrolls a learned action-conditional forward model and penalizes trajectory uncertainty. It introduces MPUR and MPER, which backpropagate through multi-step predictions to optimize a combined policy and uncertainty objective, with uncertainty estimated via dropout in a variational forward model. The approach reframes learning as a Bayesian-like process, separating epistemic and aleatoric uncertainty and enforcing rollout trajectories to stay within the training manifold. Demonstrations on the NGSIM I-80 driving dataset show that uncertainty-regularized, model-based learning can produce effective driving policies without environment interaction, outperforming several baselines. The work provides a scalable, data-efficient path for learning autonomous driving policies from observational data and offers a public dataset and planning environment for further study.

Abstract

Learning a policy using only observational data is challenging because the distribution of states it induces at execution time may differ from the distribution observed during training. We propose to train a policy by unrolling a learned model of the environment dynamics over multiple time steps while explicitly penalizing two costs: the original cost the policy seeks to optimize, and an uncertainty cost which represents its divergence from the states it is trained on. We measure this second cost by using the uncertainty of the dynamics model about its own predictions, using recent ideas from uncertainty estimation for deep networks. We evaluate our approach using a large-scale observational dataset of driving behavior recorded from traffic cameras, and show that we are able to learn effective driving policies from purely observational data, with no environment interaction.

Paper Structure

This paper contains 18 sections, 19 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Different models fitted on training points which cover a limited region the function's domain. Models make arbitrary predictions outside of this region.
  • Figure 2: Training the policy network using the stochastic forward model. Gradients with respect to costs associated with predicted states are passed through the unrolled forward model into a policy network.
  • Figure 3: Training the policy network using the differentiable uncertainty cost, calculated using dropout.
  • Figure 4: Preprocessing pipeline for the NGSIM-I80 data set. Orange arrows show same vehicles across stages. Blue arrows show corresponding extracted context state. (a) Snapshots from two of the seven cameras. (b) View point transformation, car localisation and tracking. (c) Context states are extracted from rectangular regions surrounding each vehicle. (d) Five examples of context states $i_t$ extracted at the previous stage.
  • Figure 5: Video prediction results using a deterministic and stochastic model over 200 time steps (images are subsampled across time). Two different future predictions are generated by the stochastic model by sampling two different sequences of latent variables. The deterministic model averages over possible futures, producing blurred predictions.
  • ...and 5 more figures