Driving from Vision through Differentiable Optimal Control
Flavia Sofia Acerbo, Jan Swevers, Tinne Tuytelaars, Tong Duy Son
TL;DR
DriViDOC presents an end-to-end framework that learns to drive from vision by embedding a differentiable NMPC layer whose cost parameters are dynamically predicted from camera inputs. By leveraging the differentiability of NMPC, the model trains to map high-dimensional visual data to safe, feasible low-level actions with interpretable parameter variations that reflect different human driving styles. The approach, trained via offline behavioral cloning on a hexapod driving simulator, outperforms baselines that combine NMPC and neural networks, achieving about 20% higher imitation scores and providing insights into how driving styles are realized through the NMPC cost landscape. This work demonstrates the practicality and interpretability of perception-to-control pipelines that preserve constraint satisfaction and safety while enabling style transfer across drivers.
Abstract
This paper proposes DriViDOC: a framework for Driving from Vision through Differentiable Optimal Control, and its application to learn autonomous driving controllers from human demonstrations. DriViDOC combines the automatic inference of relevant features from camera frames with the properties of nonlinear model predictive control (NMPC), such as constraint satisfaction. Our approach leverages the differentiability of parametric NMPC, allowing for end-to-end learning of the driving model from images to control. The model is trained on an offline dataset comprising various human demonstrations collected on a motion-base driving simulator. During online testing, the model demonstrates successful imitation of different driving styles, and the interpreted NMPC parameters provide insights into the achievement of specific driving behaviors. Our experimental results show that DriViDOC outperforms other methods involving NMPC and neural networks, exhibiting an average improvement of 20% in imitation scores.
