Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic
Mikael Henaff, Alfredo Canziani, Yann LeCun
TL;DR
The paper tackles covariate shift when learning driving policies from observational data by proposing a model-predictive framework that unrolls a learned action-conditional forward model and penalizes trajectory uncertainty. It introduces MPUR and MPER, which backpropagate through multi-step predictions to optimize a combined policy and uncertainty objective, with uncertainty estimated via dropout in a variational forward model. The approach reframes learning as a Bayesian-like process, separating epistemic and aleatoric uncertainty and enforcing rollout trajectories to stay within the training manifold. Demonstrations on the NGSIM I-80 driving dataset show that uncertainty-regularized, model-based learning can produce effective driving policies without environment interaction, outperforming several baselines. The work provides a scalable, data-efficient path for learning autonomous driving policies from observational data and offers a public dataset and planning environment for further study.
Abstract
Learning a policy using only observational data is challenging because the distribution of states it induces at execution time may differ from the distribution observed during training. We propose to train a policy by unrolling a learned model of the environment dynamics over multiple time steps while explicitly penalizing two costs: the original cost the policy seeks to optimize, and an uncertainty cost which represents its divergence from the states it is trained on. We measure this second cost by using the uncertainty of the dynamics model about its own predictions, using recent ideas from uncertainty estimation for deep networks. We evaluate our approach using a large-scale observational dataset of driving behavior recorded from traffic cameras, and show that we are able to learn effective driving policies from purely observational data, with no environment interaction.
