Table of Contents
Fetching ...

Integrating Decision-Making Into Differentiable Optimization Guided Learning for End-to-End Planning of Autonomous Vehicles

Wenru Liu, Yongkang Song, Chengzhen Meng, Zhiyu Huang, Haochen Liu, Chen Lv, Jun Ma

TL;DR

The paper tackles end-to-end autonomous-vehicle planning by embedding a decision-making component into a differentiable optimization framework that jointly optimizes lane decisions and ego-vehicle trajectories. It combines a transformer-based motion predictor with a differentiable optimizer and a kinematic bicycle vehicle model to create an end-to-end trainable pipeline trained on the Waymo Open Motion Dataset. The main contributions are (i) a differentiable constrained optimization formulation for lane decisions and planning with learned initialization, (ii) a bilevel training scheme that backpropagates through the optimizer, and (iii) extensive open-loop and closed-loop evaluations plus thorough ablation analysis. The results show improved safety, traveling efficiency, and riding comfort compared to baselines, and demonstrate that optimized decisions can yield superior driving performance even when diverging from expert demonstrations.

Abstract

We address the decision-making capability within an end-to-end planning framework that focuses on motion prediction, decision-making, and trajectory planning. Specifically, we formulate decision-making and trajectory planning as a differentiable nonlinear optimization problem, which ensures compatibility with learning-based modules to establish an end-to-end trainable architecture. This optimization introduces explicit objectives related to safety, traveling efficiency, and riding comfort, guiding the learning process in our proposed pipeline. Intrinsic constraints resulting from the decision-making task are integrated into the optimization formulation and preserved throughout the learning process. By integrating the differentiable optimizer with a neural network predictor, the proposed framework is end-to-end trainable, aligning various driving tasks with ultimate performance goals defined by the optimization objectives. The proposed framework is trained and validated using the Waymo Open Motion dataset. The open-loop testing reveals that while the planning outcomes using our method do not always resemble the expert trajectory, they consistently outperform baseline approaches with improved safety, traveling efficiency, and riding comfort. The closed-loop testing further demonstrates the effectiveness of optimizing decisions and improving driving performance. Ablation studies demonstrate that the initialization provided by the learning-based prediction module is essential for the convergence of the optimizer as well as the overall driving performance.

Integrating Decision-Making Into Differentiable Optimization Guided Learning for End-to-End Planning of Autonomous Vehicles

TL;DR

The paper tackles end-to-end autonomous-vehicle planning by embedding a decision-making component into a differentiable optimization framework that jointly optimizes lane decisions and ego-vehicle trajectories. It combines a transformer-based motion predictor with a differentiable optimizer and a kinematic bicycle vehicle model to create an end-to-end trainable pipeline trained on the Waymo Open Motion Dataset. The main contributions are (i) a differentiable constrained optimization formulation for lane decisions and planning with learned initialization, (ii) a bilevel training scheme that backpropagates through the optimizer, and (iii) extensive open-loop and closed-loop evaluations plus thorough ablation analysis. The results show improved safety, traveling efficiency, and riding comfort compared to baselines, and demonstrate that optimized decisions can yield superior driving performance even when diverging from expert demonstrations.

Abstract

We address the decision-making capability within an end-to-end planning framework that focuses on motion prediction, decision-making, and trajectory planning. Specifically, we formulate decision-making and trajectory planning as a differentiable nonlinear optimization problem, which ensures compatibility with learning-based modules to establish an end-to-end trainable architecture. This optimization introduces explicit objectives related to safety, traveling efficiency, and riding comfort, guiding the learning process in our proposed pipeline. Intrinsic constraints resulting from the decision-making task are integrated into the optimization formulation and preserved throughout the learning process. By integrating the differentiable optimizer with a neural network predictor, the proposed framework is end-to-end trainable, aligning various driving tasks with ultimate performance goals defined by the optimization objectives. The proposed framework is trained and validated using the Waymo Open Motion dataset. The open-loop testing reveals that while the planning outcomes using our method do not always resemble the expert trajectory, they consistently outperform baseline approaches with improved safety, traveling efficiency, and riding comfort. The closed-loop testing further demonstrates the effectiveness of optimizing decisions and improving driving performance. Ablation studies demonstrate that the initialization provided by the learning-based prediction module is essential for the convergence of the optimizer as well as the overall driving performance.

Paper Structure

This paper contains 25 sections, 30 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of our proposed approach. (a) The end-to-end planning system leverages a modular architecture of prediction and differentiable optimization while maintaining end-to-end trainability. (b) In a multi-lane driving scenario, imitation learning methods typically produce a trajectory (yellow line) that closely aligns with the expert's path (black dotted line). In contrast, our method enables AV to choose the optimized lane for travel and generate a corresponding trajectory (red line) through decision-making, which enhances safety and efficiency by allowing the AV to navigate toward lanes free from obstacles in this scenario. (c) The effectiveness of the proposed method is demonstrated in our experiments. Our approach (red line) prioritizes safety and efficiency by dynamically selecting obstacle-free lanes, as compared to the planning outcome of huang2023differentiable that yields a trajectory that closely follows the expert's (yellow line).
  • Figure 2: Pipeline of learning-based predictor and the differentiable optimizer for the integrated decision-making and trajectory planning tasks. The proposed AD framework is end-to-end trainable.
  • Figure 3: Comparison of the planning outcomes by DIPP and our proposed framework in the open-loop testing. The top figures show the planned trajectory, with the red solid lines showing the planned trajectories for the AV, and the black dotted lines representing the reference line from expert demonstration. The bottom figures plot the control inputs of the AV ($y$-axis) across the planning horizon in seconds ($x$-axis), with the blue line denoting the acceleration, and the yellow line denoting the steering angle.
  • Figure 4: Representative scenarios of the proposed framework in closed-loop testing. The red solid lines are the planned trajectories for the AV. Top: optimized lane-changing maneuvers with LV on the same lane. Bottom: typical urban driving scenes include yielding, compliance with traffic light, car-following, and U-turn.
  • Figure 5: Comparison of the planned trajectory with and without the learned initialization of decisions in closed-loop testing.

Theorems & Definitions (1)

  • Remark 1