Model-Based Imitation Learning for Urban Driving
Anthony Hu, Gianluca Corrado, Nicolas Griffiths, Zak Murez, Corina Gurau, Hudson Yeo, Alex Kendall, Roberto Cipolla, Jamie Shotton
TL;DR
MILE tackles urban driving by learning a camera-only, offline model-based imitation framework that jointly learns a world model and a driving policy. It uses a 3D geometry lifting pipeline to produce a BeV representation and a latent dynamics model to generate diverse, long-horizon predictions, including plans imagined entirely in latent space. The approach achieves state-of-the-art CARLA performance under new towns and weather and demonstrates robust closed-loop imagining capabilities, while revealing how low-dimensional latent states and probabilistic inference contribute to reliable planning and control. The work highlights practical implications for real-world deployment by enabling offline, scalable learning from expert demonstrations and suggesting avenues for reward learning and self-supervision.
Abstract
An accurate model of the environment and the dynamic agents acting in it offers great potential for improving motion planning. We present MILE: a Model-based Imitation LEarning approach to jointly learn a model of the world and a policy for autonomous driving. Our method leverages 3D geometry as an inductive bias and learns a highly compact latent space directly from high-resolution videos of expert demonstrations. Our model is trained on an offline corpus of urban driving data, without any online interaction with the environment. MILE improves upon prior state-of-the-art by 31% in driving score on the CARLA simulator when deployed in a completely new town and new weather conditions. Our model can predict diverse and plausible states and actions, that can be interpretably decoded to bird's-eye view semantic segmentation. Further, we demonstrate that it can execute complex driving manoeuvres from plans entirely predicted in imagination. Our approach is the first camera-only method that models static scene, dynamic scene, and ego-behaviour in an urban driving environment. The code and model weights are available at https://github.com/wayveai/mile.
