SunnyParking: Multi-Shot Trajectory Generation and Motion State Awareness for Human-like Parking

Jishu Miao; Han Chen; Jiankun Zhai; Qi Liu; Tsubasa Hirakawa; Takayoshi Yamashita; Hironobu Fujiyoshi

SunnyParking: Multi-Shot Trajectory Generation and Motion State Awareness for Human-like Parking

Jishu Miao, Han Chen, Jiankun Zhai, Qi Liu, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi

TL;DR

SunnyParking is proposed, a novel dual-branch E2E architecture that achieves motion state awareness by jointly predicting spatial trajectories and discrete motion state sequences (e.g., forward/reverse), and introduces a Fourier feature-based representation of target parking slots to overcome the resolution limitations of traditional bird's-eye view (BEV) approaches.

Abstract

Autonomous parking fundamentally differs from on-road driving due to its frequent direction changes and complex maneuvering requirements. However, existing End-to-End (E2E) planning methods often simplify the parking task into a geometric path regression problem, neglecting explicit modeling of the vehicle's kinematic state. This "dimensionality deficiency" easily leads to physically infeasible trajectories and deviates from real human driving behavior, particularly at critical gear-shift points in multi-shot parking scenarios. In this paper, we propose SunnyParking, a novel dual-branch E2E architecture that achieves motion state awareness by jointly predicting spatial trajectories and discrete motion state sequences (e.g., forward/reverse). Additionally, we introduce a Fourier feature-based representation of target parking slots to overcome the resolution limitations of traditional bird's-eye view (BEV) approaches, enabling high-precision target interactions. Experimental results demonstrate that our framework generates more robust and human-like trajectories in complex multi-shot parking scenarios, while significantly improving gear-shift point localization accuracy compared to state-of-the-art methods. We open-source a new parking dataset of the CARLA simulator, specifically designed to evaluate full prediction capabilities under complex maneuvers.

SunnyParking: Multi-Shot Trajectory Generation and Motion State Awareness for Human-like Parking

TL;DR

Abstract

Paper Structure (37 sections, 24 equations, 10 figures, 4 tables)

This paper contains 37 sections, 24 equations, 10 figures, 4 tables.

Introduction
Related Works
Target Representation in Goal-Conditioned Planning
Goal-state Representation
BEV Pseudo-image Representation
Fourier Feature Representation
End-to-End Planner Paradigms
Control-based planning paradigm
Trajectory-based planning paradigm
Methodology
Preliminaries: Problem Definition
Fourier Mapping Encoder
Position Encoding
Yaw Encoding
Cross-domain Target Fusion
...and 22 more sections

Figures (10)

Figure 1: Comparison of parking planning paradigms. Geometric Plan represents many current E2E approaches by focusing primarily on geometric features. Modular Plan represents the traditional rule-based paradigm (e.g., Hybrid A*). While it generates a kinematic feasible plan, achieving human-like planning remains a challenge. Our Unified Kinematic Plan achieves comparable feasibility through explicit kinematic modeling, while demonstrating superior "Human-like" qualities via its learning-based approach.
Figure 2: The Overiew of SunnyParking's Network Architecture. It takes 4 RGB images as the sensory input, along side it is the selected target slot. Vision Encoder extracts the vision-domain features then project them to the BEV space as the spatial features. Fourier Mapping Encoder first represents the target including it's $(x, y, \theta)$ in Fourier feature, then query the spatial features. In YUKUI module: Trajectory Branch takes the fused features then predicts a set of waypoints. With the additional input from Trajectory Branch's hidden layer, Motion State Branch predicts the motion state, corresponding with each points
Figure 3: The Architecture of Motion State Branch. Motion Query is enriched by the trajectory feature via cross-attention. This resulting the fused Queries is then input to a Transformer Decoder.
Figure 4: Progressive transition from teacher forcing to autoregressive prediction via Scheduled Sampling.
Figure 5: Qualitative Comparison of Parking Planning Results. Blue represents the GT trajectory, while red indicates the prediction. As task complexity increases, baseline methods struggle: TransFuser fails to generate feasible trajectories, while ParkingE2E exhibits overfitting to specific parking patterns. Our method produces robust trajectories with correctly predicted gear shift points. Notably, as shown in (\ref{['fig:traj_3_shot_case_2']}), our method demonstrates superior feature representation capabilities. It effectively handles regions near the boundaries of the BEV FOV, comparable to the direct 2D coordinate input strategy used in TransFuser.
...and 5 more figures

SunnyParking: Multi-Shot Trajectory Generation and Motion State Awareness for Human-like Parking

TL;DR

Abstract

SunnyParking: Multi-Shot Trajectory Generation and Motion State Awareness for Human-like Parking

Authors

TL;DR

Abstract

Table of Contents

Figures (10)