Table of Contents
Fetching ...

ParkingE2E: Camera-based End-to-end Parking Network, from Images to Planning

Changze Li, Ziheng Ji, Zhe Chen, Tong Qin, Ming Yang

TL;DR

This work tackles autonomous parking by introducing a camera-based end-to-end neural planner that converts surround-view RGB images into Bird's Eye View features and fuses them with a target slot through a target query. An autoregressive transformer decoder then predicts future waypoints as serialized tokens, which are executed by a cascaded PID controller for lateral and longitudinal motion. Real-vehicle experiments across four garages show the method achieves high parking success and robust performance in varied scenarios, with ablations confirming the effectiveness of BEV fusion and target-query attention over baselines. While promising, the approach acknowledges a gap to rule-based methods and outlines future directions including reinforcement learning, detailed negative sampling, and advanced simulators to improve robustness and generalization.

Abstract

Autonomous parking is a crucial task in the intelligent driving field. Traditional parking algorithms are usually implemented using rule-based schemes. However, these methods are less effective in complex parking scenarios due to the intricate design of the algorithms. In contrast, neural-network-based methods tend to be more intuitive and versatile than the rule-based methods. By collecting a large number of expert parking trajectory data and emulating human strategy via learning-based methods, the parking task can be effectively addressed. In this paper, we employ imitation learning to perform end-to-end planning from RGB images to path planning by imitating human driving trajectories. The proposed end-to-end approach utilizes a target query encoder to fuse images and target features, and a transformer-based decoder to autoregressively predict future waypoints. We conducted extensive experiments in real-world scenarios, and the results demonstrate that the proposed method achieved an average parking success rate of 87.8% across four different real-world garages. Real-vehicle experiments further validate the feasibility and effectiveness of the method proposed in this paper.

ParkingE2E: Camera-based End-to-end Parking Network, from Images to Planning

TL;DR

This work tackles autonomous parking by introducing a camera-based end-to-end neural planner that converts surround-view RGB images into Bird's Eye View features and fuses them with a target slot through a target query. An autoregressive transformer decoder then predicts future waypoints as serialized tokens, which are executed by a cascaded PID controller for lateral and longitudinal motion. Real-vehicle experiments across four garages show the method achieves high parking success and robust performance in varied scenarios, with ablations confirming the effectiveness of BEV fusion and target-query attention over baselines. While promising, the approach acknowledges a gap to rule-based methods and outlines future directions including reinforcement learning, detailed negative sampling, and advanced simulators to improve robustness and generalization.

Abstract

Autonomous parking is a crucial task in the intelligent driving field. Traditional parking algorithms are usually implemented using rule-based schemes. However, these methods are less effective in complex parking scenarios due to the intricate design of the algorithms. In contrast, neural-network-based methods tend to be more intuitive and versatile than the rule-based methods. By collecting a large number of expert parking trajectory data and emulating human strategy via learning-based methods, the parking task can be effectively addressed. In this paper, we employ imitation learning to perform end-to-end planning from RGB images to path planning by imitating human driving trajectories. The proposed end-to-end approach utilizes a target query encoder to fuse images and target features, and a transformer-based decoder to autoregressively predict future waypoints. We conducted extensive experiments in real-world scenarios, and the results demonstrate that the proposed method achieved an average parking success rate of 87.8% across four different real-world garages. Real-vehicle experiments further validate the feasibility and effectiveness of the method proposed in this paper.
Paper Structure (22 sections, 8 equations, 6 figures, 3 tables)

This paper contains 22 sections, 8 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Illustration of the overall workflow. Our model takes the surround-view camera images and the target slot as inputs and outputs the predicted trajectory waypoints, which are later executed by the controller. Supplementary video material is available at: https://youtu.be/urOEHJH1TBQ.
  • Figure 2: Overview of our method. Multi-view RGB images are processed and the image features are transformed into BEV representation. The target slot is used to generate the BEV target features. We fuse target features and image BEV features using target query. Then we obtain the predicted trajectory points one by one using the autoregressive transformer decoder.
  • Figure 3: The architecture of the target query illustrates that we add the same positional encoding to the target feature and camera feature to establish the spatial relationship between the two types of features.
  • Figure 4: We use a Changan vehicle as the experimental platform. The vehicle utilizes Intel NUC devices to execute model inference and control.
  • Figure 5: Several different garages are utilized for training and testing the system. Some of the parking slot data from Garage I and II are used for training. While the remaining parking slot data from Garage I and II that are not involved in training as well as all collected slot data from Garage III and IV are used for testing.
  • ...and 1 more figures