Learning the References of Online Model Predictive Control for Urban Self-Driving

Yubin Wang; Zengqi Peng; Yusen Xie; Yulin Li; Hakim Ghazzai; Jun Ma

Learning the References of Online Model Predictive Control for Urban Self-Driving

Yubin Wang, Zengqi Peng, Yusen Xie, Yulin Li, Hakim Ghazzai, Jun Ma

TL;DR

The paper tackles safe and efficient urban autonomous driving under dynamic traffic by fusing model-based MPC with a step-based DRL policy that outputs instantaneous references to modulate MPC costs. It introduces a learnable reference state $oldsymbol{x}_{ref}$ and a learnable weight $oldsymbol{Q}_{ref}$, forming a cost term $J_{ref,k}$ that latent-encodes safety, and uses SAC to learn real-time references from partial sensor observations. The approach is evaluated in CARLA with nine traffic participants and demonstrates superior safety, speed, and computational efficiency compared to baselines, plus successful zero-shot sim-to-real transfer and some robustness to noise and vehicle type changes. The work provides open-source code and shows potential for scalable, real-time, safety-aware planning in complex urban environments, with future directions including broader generalization and robustness enhancements.

Abstract

In this work, we propose a novel learning-based model predictive control (MPC) framework for motion planning and control of urban self-driving. In this framework, instantaneous references and cost functions of online MPC are learned from raw sensor data without relying on any oracle or predicted states of traffic. Moreover, driving safety conditions are latently encoded via the introduction of a learnable instantaneous reference vector. In particular, we implement a deep reinforcement learning (DRL) framework for policy search, where practical and lightweight raw observations are processed to reason about the traffic and provide the online MPC with instantaneous references. The proposed approach is validated in a high-fidelity simulator, where our development manifests remarkable adaptiveness to complex and dynamic traffic. Furthermore, sim-to-real deployments are also conducted to evaluate the generalizability of the proposed framework in various real-world applications. Also, we provide the open-source code and video demonstrations at the project website: https://latent-mpc.github.io/.

Learning the References of Online Model Predictive Control for Urban Self-Driving

TL;DR

and a learnable weight

, forming a cost term

that latent-encodes safety, and uses SAC to learn real-time references from partial sensor observations. The approach is evaluated in CARLA with nine traffic participants and demonstrates superior safety, speed, and computational efficiency compared to baselines, plus successful zero-shot sim-to-real transfer and some robustness to noise and vehicle type changes. The work provides open-source code and shows potential for scalable, real-time, safety-aware planning in complex urban environments, with future directions including broader generalization and robustness enhancements.

Abstract

Paper Structure (18 sections, 18 equations, 6 figures, 5 tables, 2 algorithms)

This paper contains 18 sections, 18 equations, 6 figures, 5 tables, 2 algorithms.

Introduction
Related Works
Preliminaries and Problem Statement
Learning-based Model Predictive Control
Online MPC Reformulation with Instantaneous References
Real-Time Decision Variable Generation
Learning the Decision Variables via DRL
Partial Observation
Action
Reward
Policy Training
Experiments
Implementation Setup
Driving Performance
Comparison Analysis
...and 3 more sections

Figures (6)

Figure 1: Overview of the proposed framework for urban self-driving. A policy network is trained to produce instantaneous decision variables for low-level online MPC, whose cost functions are modulated to latently encode the safety conditions of collision avoidance and drivable surface boundaries.
Figure 2: Illustration of the pose transformation $\mathcal{T}$ from the global coordinate frame $\mathcal{W}_g$ to the centerline reference coordinate frame $\mathcal{W}_\mathrm{ref}$.
Figure 3: Visualization of learned positions in decision variables for different driving scenarios.
Figure 4: Key frames of trails with our framework in complex and dynamic traffic environments. Top: agile overtaking. Bottom: emergent collision avoidance. The left side of each subfigure is a bird-view image where the red rectangle is the ego vehicle and the green rectangles represent other traffic participants.
Figure 5: Key frames of a trail of transferring the trained model to a new type of vehicle.
...and 1 more figures

Learning the References of Online Model Predictive Control for Urban Self-Driving

TL;DR

Abstract

Learning the References of Online Model Predictive Control for Urban Self-Driving

Authors

TL;DR

Abstract

Table of Contents

Figures (6)