HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios

Mingyang Jiang; Yueyuan Li; Songan Zhang; Siyuan Chen; Chunxiang Wang; Ming Yang

HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios

Mingyang Jiang, Yueyuan Li, Songan Zhang, Siyuan Chen, Chunxiang Wang, Ming Yang

TL;DR

HOPE tackles the parking path-planning problem under diverse, real-world scenarios by fusing reinforcement learning with Reeds-Shepp geometric planning through a transformer-based information fusion module. The key innovations include a hybrid policy that blends a learnable RL component with a rule-based RS policy, an action-mask mechanism for efficient and safe action pruning, and a difficulty-ranking scheme for scenario generation and evaluation. Empirical results show HOPE achieving state-of-the-art planning success rates and strong generalization across normal, complex, and extreme parking scenarios, outperforming rule-based methods and naive RL approaches, with substantial real-world validation in indoor parking. The work demonstrates that integrating classical geometric priors with modern learning-based planning yields robust, efficient, and scalable autonomous parking solutions suitable for practical deployment, with clear directions for extending to dynamic environments.

Abstract

Automated parking stands as a highly anticipated application of autonomous driving technology. However, existing path planning methodologies fall short of addressing this need due to their incapability to handle the diverse and complex parking scenarios in reality. While non-learning methods provide reliable planning results, they are vulnerable to intricate occasions, whereas learning-based ones are good at exploration but unstable in converging to feasible solutions. To leverage the strengths of both approaches, we introduce Hybrid pOlicy Path plannEr (HOPE). This novel solution integrates a reinforcement learning agent with Reeds-Shepp curves, enabling effective planning across diverse scenarios. HOPE guides the exploration of the reinforcement learning agent by applying an action mask mechanism and employs a transformer to integrate the perceived environmental information with the mask. To facilitate the training and evaluation of the proposed planner, we propose a criterion for categorizing the difficulty level of parking scenarios based on space and obstacle distribution. Experimental results demonstrate that our approach outperforms typical rule-based algorithms and traditional reinforcement learning methods, showing higher planning success rates and generalization across various scenarios. We also conduct real-world experiments to verify the practicability of HOPE. The code for our solution is openly available on https://github.com/jiamiya/HOPE.

HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios

TL;DR

Abstract

Paper Structure (39 sections, 31 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 39 sections, 31 equations, 8 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Non-learning-based parking path-planning
Learning-based parking path-planning
Preliminaries
Reinforcement learning for path planning
PPO
SAC
Reeds-Shepp curve
Methodology
Overview of architecture
Hybrid policy
Rule-based policy from Reeds-Shepp curve
Exploring and learning with RS policy
Action mask
...and 24 more sections

Figures (8)

Figure 1: The overall structure of the proposed method, including the interaction loop with the simulator (left) and the network architecture (right).
Figure 2: The description for ranking parameters in vertical parking (left) and parallel parking (right).
Figure 3: The episode reward curves (a) and success rate (b)
Figure 4: The visualization of the planning process and results of the Hybrid A* (a-1)-(e-1) and the proposed HOPE (a-2)-(e-2). The blue-to-green gradient rectangles represent the states explored during the algorithm's search process, while the yellow curves indicate the path planning result. In the normal parallel parking case shown in (a), both methods provide concise path planning results. In the vertical parking scenario shown in (b) and the normal dlp scenario in (d), although both methods succeed in planning, our approach yields more reasonable results. In the narrow parallel parking scenario (c) and the scenario requiring parking with the front of the vehicle facing inward (e), the Hybrid A* fails to plan, while our approach succeeds.
Figure 5: Success rate using the shortest K RS curves.
...and 3 more figures

HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios

TL;DR

Abstract

HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios

Authors

TL;DR

Abstract

Table of Contents

Figures (8)