PP-TIL: Personalized Planning for Autonomous Driving with Instance-based Transfer Imitation Learning

Fangze Lin; Ying He; Fei Yu

PP-TIL: Personalized Planning for Autonomous Driving with Instance-based Transfer Imitation Learning

Fangze Lin, Ying He, Fei Yu

TL;DR

The paper tackles personalized motion planning in urban autonomous driving under data scarcity by transferring knowledge from a large expert-domain corpus. It introduces PP-TIL, which pre-trains a planner on expert data and fine-tunes with instance-based transfer imitation learning, combining imitation loss $\mathcal{L}_{IL}$ and a regularization term $\mathcal{L}_{IRL}$ computed via Maximum Entropy IRL to align with user style, yielding $\mathcal{L}_{TIL}^\alpha = \mathcal{L}_{IL} + \alpha \mathcal{L}_{IRL}$. A differentiable nonlinear optimizer acts as a safety layer to refine plans during fine-tuning and a differentiable kinematic model ensures end-to-end differentiability. Experiments on the Waymo Open Motion Dataset show improved style matching and planning performance, with the best results achieved when mixing roughly 75% expert data and using a sufficiently large $\alpha$, while noting the absence of closed-loop real-world validation and reliance on trajectory-feature style metrics.

Abstract

Personalized motion planning holds significant importance within urban automated driving, catering to the unique requirements of individual users. Nevertheless, prior endeavors have frequently encountered difficulties in simultaneously addressing two crucial aspects: personalized planning within intricate urban settings and enhancing planning performance through data utilization. The challenge arises from the expensive and limited nature of user data, coupled with the scene state space tending towards infinity. These factors contribute to overfitting and poor generalization problems during model training. Henceforth, we propose an instance-based transfer imitation learning approach. This method facilitates knowledge transfer from extensive expert domain data to the user domain, presenting a fundamental resolution to these issues. We initially train a pre-trained model using large-scale expert data. Subsequently, during the fine-tuning phase, we feed the batch data, which comprises expert and user data. Employing the inverse reinforcement learning technique, we extract the style feature distribution from user demonstrations, constructing the regularization term for the approximation of user style. In our experiments, we conducted extensive evaluations of the proposed method. Compared to the baseline methods, our approach mitigates the overfitting issue caused by sparse user data. Furthermore, we discovered that integrating the driving model with a differentiable nonlinear optimizer as a safety protection layer for end-to-end personalized fine-tuning results in superior planning performance.

PP-TIL: Personalized Planning for Autonomous Driving with Instance-based Transfer Imitation Learning

TL;DR

and a regularization term

computed via Maximum Entropy IRL to align with user style, yielding

. A differentiable nonlinear optimizer acts as a safety layer to refine plans during fine-tuning and a differentiable kinematic model ensures end-to-end differentiability. Experiments on the Waymo Open Motion Dataset show improved style matching and planning performance, with the best results achieved when mixing roughly 75% expert data and using a sufficiently large

, while noting the absence of closed-loop real-world validation and reliance on trajectory-feature style metrics.

Abstract

Paper Structure (23 sections, 8 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 23 sections, 8 equations, 4 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Machine Learning-based Planning for Autonomous Driving
Personalized Planning for Autonomous Driving
Instance-Based Transfer Learning
Personalized Planning via Transfer Imitation Learning
Problem Formulation
Pre-Training with Large-Scale Expert Data
Differentiable Kinematic Model
Neural Network
Fine-Tuning with Instance-Based Transfer Imitation Learning
Differentiable Nonlinear Optimization
Maximum Entropy Inverse Reinforcement Learning
Instance-Based Transfer Imitation Learning
Experiments
...and 8 more sections

Figures (4)

Figure 1: Approaches to learning driving style can generally be categorized into two main groups: (a) The first class involves using inverse reinforcement learning to learn cost functions from demonstrations. (b) The second class involves using imitation learning to learn neural networks from demonstrations. (c) Nevertheless, these methods often encounter overfitting and poor generalization challenges when learning from sparse user demonstrations. To address these challenges, we propose an approach termed Personalized Planning via Transfer Imitation Learning (PP-TIL).
Figure 2: Personalized planning framework based on transfer imitation learning. For the specific structural design of the neural network module shown in the figure, we refer to DIPP huang2023differentiable. During the pre-training stage, we train the neural network using extensive expert data. In the fine-tuning phase, we initialize the neural network with the pre-trained parameters and employ a combination of expert and user data as input batches to the model. Additionally, the differentiable motion planner module can be optionally utilized at the output side of the neural network for end-to-end fine-tuning. We leverage the maximum entropy inverse reinforcement learning method to construct $\mathcal{L}_{IRL}$. This term aiming to match the user trajectory feature expectation and learn the user trajectory style. And $\mathcal{L}_{IL}$ serves to minimize the experience error of trajectories, thereby ensuring the effectiveness of the planning.
Figure 3: Visualization of the final results. We compare the user's real trajectory with the output trajectory of three different models. Different categories are visually distinguished using various colors in the diagram. The color scheme for the rectangles is depicted as follows: autonomous vehicle, predicted vehicle, other vehicle, crosswalk and speed bump. The color scheme for the lines is depicted as follows: planned trajectory, predicted trajectory and road edges. In particular, the black dotted line represents the ground-truth trajectory. The red circle in the figure aids in better distinguishing the differences between different approaches.
Figure 4: Comparison of different steps. The figure depicts the average values from three repeated experiments. We perform 1000 times parameter updates for each model, reducing the learning rate by half every 200 steps. The parentheses indicate the fine-tuning framework utilized, with the outer parentheses denoting the part for parameter update.

PP-TIL: Personalized Planning for Autonomous Driving with Instance-based Transfer Imitation Learning

TL;DR

Abstract

PP-TIL: Personalized Planning for Autonomous Driving with Instance-based Transfer Imitation Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)