End-to-end Driving via Conditional Imitation Learning
Felipe Codevilla, Matthias Müller, Antonio López, Vladlen Koltun, Alexey Dosovitskiy
TL;DR
Imitation-learning-based driving often suffers from perceptuomotor ambiguity at intersections, making end-to-end policies uncontrollable at test time. The authors propose command-conditioned imitation learning, where a high-level command c guides the perception-to-action mapping, enabling test-time control by a navigator or passenger. They present two architectures (command-input and branched) and show the branched model achieves the best performance in both CARLA simulations and a 1/5-scale truck, with ablations confirming the importance of noise-injected training data and online augmentation. The results indicate that command-conditioned end-to-end driving can be made controllable and robust, suggesting a viable path toward scalable, vision-based autonomous driving.
Abstract
Deep networks trained on demonstrations of human driving have learned to follow roads and avoid obstacles. However, driving policies trained via imitation learning cannot be controlled at test time. A vehicle trained end-to-end to imitate an expert cannot be guided to take a specific turn at an upcoming intersection. This limits the utility of such systems. We propose to condition imitation learning on high-level command input. At test time, the learned driving policy functions as a chauffeur that handles sensorimotor coordination but continues to respond to navigational commands. We evaluate different architectures for conditional imitation learning in vision-based driving. We conduct experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area. Both systems drive based on visual input yet remain responsive to high-level navigational commands. The supplementary video can be viewed at https://youtu.be/cFtnflNe5fM
