Table of Contents
Fetching ...

CarPLAN: Context-Adaptive and Robust Planning with Dynamic Scene Awareness for Autonomous Driving

Junyong Yun, Jungho Kim, ByungHyun Lee, Dongyoung Lee, Sehwan Choi, Seunghyeop Nam, Kichun Jo, Jun Won Choi

Abstract

Imitation learning (IL) is widely used for motion planning in autonomous driving due to its data efficiency and access to real-world driving data. For safe and robust real-world driving, IL-based planning requires capturing the complex driving contexts inherent in real-world data and enabling context-adaptive decision-making, rather than relying solely on expert trajectory imitation. In this paper, we propose CarPLAN, a novel IL-based motion planning framework that explicitly enhances driving context understanding and enables adaptive planning across diverse traffic scenarios. Our contributions are twofold: We introduce Displacement-Aware Predictive Encoding (DPE) to improve the model's spatial awareness by predicting future displacement vectors between the Autonomous Vehicle (AV) and surrounding scene elements. This allows the planner to account for relational spacing when generating trajectories. In addition to the standard imitation loss, we incorporate an augmented loss term that captures displacement prediction errors, ensuring planning decisions consider relative distances from other agents. To improve the model's ability to handle diverse driving contexts, we propose Context-Adaptive Multi-Expert Decoder (CMD), which leverages the Mixture of Experts (MoE) framework. CMD dynamically selects the most suitable expert decoders based on scene structure at each Transformer layer, enabling adaptive and context-aware planning in dynamic environments. We evaluate CarPLAN on the nuPlan benchmark and demonstrate state-of-the-art performance across all closed-loop simulation metrics. In particular, CarPLAN exhibits robust performance on challenging scenarios such as Test14-Hard, validating its effectiveness in complex driving conditions. Additional experiments on the Waymax benchmark further demonstrate its generalization capability across different benchmark settings.

CarPLAN: Context-Adaptive and Robust Planning with Dynamic Scene Awareness for Autonomous Driving

Abstract

Imitation learning (IL) is widely used for motion planning in autonomous driving due to its data efficiency and access to real-world driving data. For safe and robust real-world driving, IL-based planning requires capturing the complex driving contexts inherent in real-world data and enabling context-adaptive decision-making, rather than relying solely on expert trajectory imitation. In this paper, we propose CarPLAN, a novel IL-based motion planning framework that explicitly enhances driving context understanding and enables adaptive planning across diverse traffic scenarios. Our contributions are twofold: We introduce Displacement-Aware Predictive Encoding (DPE) to improve the model's spatial awareness by predicting future displacement vectors between the Autonomous Vehicle (AV) and surrounding scene elements. This allows the planner to account for relational spacing when generating trajectories. In addition to the standard imitation loss, we incorporate an augmented loss term that captures displacement prediction errors, ensuring planning decisions consider relative distances from other agents. To improve the model's ability to handle diverse driving contexts, we propose Context-Adaptive Multi-Expert Decoder (CMD), which leverages the Mixture of Experts (MoE) framework. CMD dynamically selects the most suitable expert decoders based on scene structure at each Transformer layer, enabling adaptive and context-aware planning in dynamic environments. We evaluate CarPLAN on the nuPlan benchmark and demonstrate state-of-the-art performance across all closed-loop simulation metrics. In particular, CarPLAN exhibits robust performance on challenging scenarios such as Test14-Hard, validating its effectiveness in complex driving conditions. Additional experiments on the Waymax benchmark further demonstrate its generalization capability across different benchmark settings.
Paper Structure (29 sections, 6 equations, 5 figures, 7 tables)

This paper contains 29 sections, 6 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Challenges in imitation learning. Minimizing imitation loss alone does not guarantee safe planning. This example demonstrates that a lower L1 loss could result in lane departures or collisions.
  • Figure 2: Overall structure of CarPLAN. CarPLAN comprises two main networks: DPE and CMD. DPE is trained to predict the displacements or velocity between the AV and surrounding scene elements at each future timestep, generating Displacement-Aware Features. CMD utilizes multiple experts, dynamically selected by the Scene-Aware Router, to generate the AV's future trajectory.
  • Figure 3: Visualization of expert selection scores across layers. The dark red vehicle represents the AV, while the yellow and red lines indicate the ground truth (GT) and the predicted future trajectory with the highest score, respectively. The softmax scores are displayed for layers 2, 3, and 4. (a) In two distinct straight-driving scenarios, differences in the distribution of surrounding agents lead to distinct expert selections. (b) In similar driving scenarios, expert selection remains mostly consistent.
  • Figure 4: Qualitative results on closed-loop simulations in nuPlan benchmark. The yellow trajectory represents the recorded actual trajectory of the vehicle, while the red trajectory indicates the model's predicted trajectory with the highest probability at each timestep. The dark red vehicle represents the AV, and black vehicles or pedestrians that turn red signify a collision occurrence. Red dashed boxes highlight critical events observed during the simulation.
  • Figure 5: Qualitative results on closed-loop simulations in Waymax benchmark. The blue dots represent the trajectory executed during the closed-loop simulation. The blue bounding box denotes the AV, while gray bounding boxes indicate surrounding vehicles and boxes that turn red indicate a collision event during simulation. Green circles correspond to traffic lights in the green phase. Dark gray dots illustrate the road layout, and light gray dots represent centerlines. Red dashed boxes highlight critical events observed during the simulation.