Table of Contents
Fetching ...

Driving from Vision through Differentiable Optimal Control

Flavia Sofia Acerbo, Jan Swevers, Tinne Tuytelaars, Tong Duy Son

TL;DR

DriViDOC presents an end-to-end framework that learns to drive from vision by embedding a differentiable NMPC layer whose cost parameters are dynamically predicted from camera inputs. By leveraging the differentiability of NMPC, the model trains to map high-dimensional visual data to safe, feasible low-level actions with interpretable parameter variations that reflect different human driving styles. The approach, trained via offline behavioral cloning on a hexapod driving simulator, outperforms baselines that combine NMPC and neural networks, achieving about 20% higher imitation scores and providing insights into how driving styles are realized through the NMPC cost landscape. This work demonstrates the practicality and interpretability of perception-to-control pipelines that preserve constraint satisfaction and safety while enabling style transfer across drivers.

Abstract

This paper proposes DriViDOC: a framework for Driving from Vision through Differentiable Optimal Control, and its application to learn autonomous driving controllers from human demonstrations. DriViDOC combines the automatic inference of relevant features from camera frames with the properties of nonlinear model predictive control (NMPC), such as constraint satisfaction. Our approach leverages the differentiability of parametric NMPC, allowing for end-to-end learning of the driving model from images to control. The model is trained on an offline dataset comprising various human demonstrations collected on a motion-base driving simulator. During online testing, the model demonstrates successful imitation of different driving styles, and the interpreted NMPC parameters provide insights into the achievement of specific driving behaviors. Our experimental results show that DriViDOC outperforms other methods involving NMPC and neural networks, exhibiting an average improvement of 20% in imitation scores.

Driving from Vision through Differentiable Optimal Control

TL;DR

DriViDOC presents an end-to-end framework that learns to drive from vision by embedding a differentiable NMPC layer whose cost parameters are dynamically predicted from camera inputs. By leveraging the differentiability of NMPC, the model trains to map high-dimensional visual data to safe, feasible low-level actions with interpretable parameter variations that reflect different human driving styles. The approach, trained via offline behavioral cloning on a hexapod driving simulator, outperforms baselines that combine NMPC and neural networks, achieving about 20% higher imitation scores and providing insights into how driving styles are realized through the NMPC cost landscape. This work demonstrates the practicality and interpretability of perception-to-control pipelines that preserve constraint satisfaction and safety while enabling style transfer across drivers.

Abstract

This paper proposes DriViDOC: a framework for Driving from Vision through Differentiable Optimal Control, and its application to learn autonomous driving controllers from human demonstrations. DriViDOC combines the automatic inference of relevant features from camera frames with the properties of nonlinear model predictive control (NMPC), such as constraint satisfaction. Our approach leverages the differentiability of parametric NMPC, allowing for end-to-end learning of the driving model from images to control. The model is trained on an offline dataset comprising various human demonstrations collected on a motion-base driving simulator. During online testing, the model demonstrates successful imitation of different driving styles, and the interpreted NMPC parameters provide insights into the achievement of specific driving behaviors. Our experimental results show that DriViDOC outperforms other methods involving NMPC and neural networks, exhibiting an average improvement of 20% in imitation scores.
Paper Structure (17 sections, 9 equations, 6 figures, 2 tables)

This paper contains 17 sections, 9 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: DriViDOC architecture: at time $t$, the visually encoded information by the neural network $\chi_t$ serves to compute the parameters $\mathbf{p}_t$ of an NMPC, which controls a high-fidelity vehicle dynamics model used for human-in-the-loop testing on a hexapod platform. The driving model is learned end-to-end from pixels to control, based on human demonstrations collected offline on the platform.
  • Figure 2: Illustrations of four different driving styles present in the dataset. In (a): the Gaussian probability distributions fitted to $v_x$ for four different drivers. In (b): position with respect to the centerline (dashed line) for the same drivers while entering the first curve of the track, in global cartesian coordinates. Inside the plot, a minimized map of the track is included.
  • Figure 3: Relevant states from closed-loop simulations of two DriViDOC models, compared with the corresponding driver distribution. On the right axis, we indicate the curvature with a dashed line. In the $d$ plot, the lane boundaries are shown as horizontal solid lines.
  • Figure 4: On the left axis of each plot, parameters $\mathbf{p}(t)$ values from closed-loop simulations of DriViDOC models trained for 2 different drivers. The offsets $\bar{d}$ and $\bar{v_x}$ are normalized between -1 and 1. On the right axis, we indicate the curvature with a dashed line.
  • Figure 5: DriViDOC vs TRACK baseline for Driver01 ($v_x$ and $a_x$).
  • ...and 1 more figures