PP-TIL: Personalized Planning for Autonomous Driving with Instance-based Transfer Imitation Learning
Fangze Lin, Ying He, Fei Yu
TL;DR
The paper tackles personalized motion planning in urban autonomous driving under data scarcity by transferring knowledge from a large expert-domain corpus. It introduces PP-TIL, which pre-trains a planner on expert data and fine-tunes with instance-based transfer imitation learning, combining imitation loss $\mathcal{L}_{IL}$ and a regularization term $\mathcal{L}_{IRL}$ computed via Maximum Entropy IRL to align with user style, yielding $\mathcal{L}_{TIL}^\alpha = \mathcal{L}_{IL} + \alpha \mathcal{L}_{IRL}$. A differentiable nonlinear optimizer acts as a safety layer to refine plans during fine-tuning and a differentiable kinematic model ensures end-to-end differentiability. Experiments on the Waymo Open Motion Dataset show improved style matching and planning performance, with the best results achieved when mixing roughly 75% expert data and using a sufficiently large $\alpha$, while noting the absence of closed-loop real-world validation and reliance on trajectory-feature style metrics.
Abstract
Personalized motion planning holds significant importance within urban automated driving, catering to the unique requirements of individual users. Nevertheless, prior endeavors have frequently encountered difficulties in simultaneously addressing two crucial aspects: personalized planning within intricate urban settings and enhancing planning performance through data utilization. The challenge arises from the expensive and limited nature of user data, coupled with the scene state space tending towards infinity. These factors contribute to overfitting and poor generalization problems during model training. Henceforth, we propose an instance-based transfer imitation learning approach. This method facilitates knowledge transfer from extensive expert domain data to the user domain, presenting a fundamental resolution to these issues. We initially train a pre-trained model using large-scale expert data. Subsequently, during the fine-tuning phase, we feed the batch data, which comprises expert and user data. Employing the inverse reinforcement learning technique, we extract the style feature distribution from user demonstrations, constructing the regularization term for the approximation of user style. In our experiments, we conducted extensive evaluations of the proposed method. Compared to the baseline methods, our approach mitigates the overfitting issue caused by sparse user data. Furthermore, we discovered that integrating the driving model with a differentiable nonlinear optimizer as a safety protection layer for end-to-end personalized fine-tuning results in superior planning performance.
