Learning a Driving Simulator
Eder Santana, George Hotz
TL;DR
The paper tackles learning a driving simulator by first embedding real road-frame videos into a Gaussian latent space using a VAE-GAN hybrid autoencoder, then learning latent-space transitions with an action-conditioned RNN. By training the autoencoder with GAN-based costs, the model generates more realistic frames than purely MSE-based approaches, while the RNN predicts future latent codes to render subsequent frames. The results show the approach can maintain road structure over many frames (up to ~100) but has difficulty with curved trajectories, highlighting the need for more advanced sequence models and sensor fusion. The authors release a driving dataset and training code to encourage further exploration of learned video prediction for driving simulation.
Abstract
Comma.ai's approach to Artificial Intelligence for self-driving cars is based on an agent that learns to clone driver behaviors and plans maneuvers by simulating future events in the road. This paper illustrates one of our research approaches for driving simulation. One where we learn to simulate. Here we investigate variational autoencoders with classical and learned cost functions using generative adversarial networks for embedding road frames. Afterwards, we learn a transition model in the embedded space using action conditioned Recurrent Neural Networks. We show that our approach can keep predicting realistic looking video for several frames despite the transition model being optimized without a cost function in the pixel space.
