Speeding Up Path Planning via Reinforcement Learning in MCTS for Automated Parking

Xinlong Zheng; Xiaozhou Zhang; Donghao Xu

Speeding Up Path Planning via Reinforcement Learning in MCTS for Automated Parking

Xinlong Zheng, Xiaozhou Zhang, Donghao Xu

TL;DR

The paper tackles real-time automated parking by coupling reinforcement learning with Monte Carlo Tree Search to produce fast, high-quality plans. Parking is modeled as a low-speed MDP with a bicycle kinematic model, where observations come from occupancy grids and rewards encode safety, comfort, and efficiency; a neural evaluator provides priors and value estimates to the MCTS. A policy-value network is trained online from MCTS outcomes, guiding future searches and reducing reliance on human driver data. Empirical results on simulated and real parking tasks show substantial speedups over Hybrid A*, with median planning times reduced to around 7% of the baseline and robustness across discretization schemes.

Abstract

In this paper, we address a method that integrates reinforcement learning into the Monte Carlo tree search to boost online path planning under fully observable environments for automated parking tasks. Sampling-based planning methods under high-dimensional space can be computationally expensive and time-consuming. State evaluation methods are useful by leveraging the prior knowledge into the search steps, making the process faster in a real-time system. Given the fact that automated parking tasks are often executed under complex environments, a solid but lightweight heuristic guidance is challenging to compose in a traditional analytical way. To overcome this limitation, we propose a reinforcement learning pipeline with a Monte Carlo tree search under the path planning framework. By iteratively learning the value of a state and the best action among samples from its previous cycle's outcomes, we are able to model a value estimator and a policy generator for given states. By doing that, we build up a balancing mechanism between exploration and exploitation, speeding up the path planning process while maintaining its quality without using human expert driver data.

Speeding Up Path Planning via Reinforcement Learning in MCTS for Automated Parking

TL;DR

Abstract

Paper Structure (17 sections, 6 equations, 6 figures, 1 algorithm)

This paper contains 17 sections, 6 equations, 6 figures, 1 algorithm.

INTRODUCTION
RELATED WORKS
PRELIMINARIES
Problem Formulation
Monte Carlo Tree Search
METHODOLOGY
Monte Carlo Tree Search Design
Selection
Expansion
Simulation
Backpropagation
Reinforcement Learning Design
EXPERIMENTS
Parallel parking with narrow spot in length
Perpendicular parking under complex environment
...and 2 more sections

Figures (6)

Figure 1: Reinforcement learning integrated MCTS in path planning tasks. (a) The agent plans its move under the guidance of MCTS. $s_0$ is the start state and $s_\text{T}$ is the destination state. $a_t$ is the action takes at time $t$ selected by PUCTrosin2011multi. (b) Neural network training against the previously produced results. $z$ is the terminated tree that can be used to generate training input state $s_t$, training label $\boldsymbol{\pi}_t$, and $r_t$. $f_\theta$ is the neural network projecting $s_t$ to policy distribution $\boldsymbol{p_t}$ and value $v_t$.
Figure 2: Bicycle vehicle model
Figure 3: Cycled steps of MCTS in path planning. (a) Iteratively selection tree traversal. (b) Node expansion with policy generator. (c) Simulation with value approximator and cost function $\mathcal{C}_{\text{path}}$. (d) All-way propagation back to root node. (e) Neural network evaluator.
Figure 4: Reinforcement learning pipeline. (a) Training data is retrieved once MCTS is terminated, as the policy improvement finished. (b) The architecture of the evaluation neural network is composed of a convolutional backbone and two separated MLP heads. The updated model parameters are used in the next MCTS iteration.
Figure 5: (a) Training metric. The darker line is the median over the validation dataset, and the pale shaded area is formed by the 10th and 90th percentiles. The blue one is the planning time which Hybrid A* algorithm takes while the red one is the performance of MCTS at different training steps. (b) Planning time and success rate comparison between trained MCTS and Hybrid A* under different discretization setups over the validation dataset.
...and 1 more figures

Speeding Up Path Planning via Reinforcement Learning in MCTS for Automated Parking

TL;DR

Abstract

Speeding Up Path Planning via Reinforcement Learning in MCTS for Automated Parking

Authors

TL;DR

Abstract

Table of Contents

Figures (6)