Table of Contents
Fetching ...

TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning

Zebin Xing, Pengxuan Yang, Linbo Wang, Yichen Zhang, Yiming Hu, Yupeng Zheng, Junli Wang, Yinfeng Gao, Guang Li, Kun Ma, Long Chen, Zhongpu Xia, Qichao Zhang, Hangjun Ye, Dongbin Zhao

TL;DR

TrajMoE tackles the challenge of scenario-dependent trajectory priors in end-to-end autonomous driving by introducing a Sparse Mixture of Experts (MoE) transformer to tailor priors per scene, coupled with reinforcement-learning fine-tuning of trajectory scoring (GRPO) and ensembling across perception backbones. The approach enables scene-aware trajectory processing, policy-guided score refinement, and robust planning through diverse backbones, achieving a top-tier navsim ICCV score (51.08) and third place. The key contributions are the integration of Sparse MoE for dynamic trajectory priors, GRPO-based scoring refinement, and multi-backbone ensembling to boost planning reliability in diverse driving scenarios. These techniques collectively advance adaptive trajectory planning for safer, more reliable autonomous driving systems.

Abstract

Current autonomous driving systems often favor end-to-end frameworks, which take sensor inputs like images and learn to map them into trajectory space via neural networks. Previous work has demonstrated that models can achieve better planning performance when provided with a prior distribution of possible trajectories. However, these approaches often overlook two critical aspects: 1) The appropriate trajectory prior can vary significantly across different driving scenarios. 2) Their trajectory evaluation mechanism lacks policy-driven refinement, remaining constrained by the limitations of one-stage supervised training. To address these issues, we explore improvements in two key areas. For problem 1, we employ MoE to apply different trajectory priors tailored to different scenarios. For problem 2, we utilize Reinforcement Learning to fine-tune the trajectory scoring mechanism. Additionally, we integrate models with different perception backbones to enhance perceptual features. Our integrated model achieved a score of 51.08 on the navsim ICCV benchmark, securing third place.

TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning

TL;DR

TrajMoE tackles the challenge of scenario-dependent trajectory priors in end-to-end autonomous driving by introducing a Sparse Mixture of Experts (MoE) transformer to tailor priors per scene, coupled with reinforcement-learning fine-tuning of trajectory scoring (GRPO) and ensembling across perception backbones. The approach enables scene-aware trajectory processing, policy-guided score refinement, and robust planning through diverse backbones, achieving a top-tier navsim ICCV score (51.08) and third place. The key contributions are the integration of Sparse MoE for dynamic trajectory priors, GRPO-based scoring refinement, and multi-backbone ensembling to boost planning reliability in diverse driving scenarios. These techniques collectively advance adaptive trajectory planning for safer, more reliable autonomous driving systems.

Abstract

Current autonomous driving systems often favor end-to-end frameworks, which take sensor inputs like images and learn to map them into trajectory space via neural networks. Previous work has demonstrated that models can achieve better planning performance when provided with a prior distribution of possible trajectories. However, these approaches often overlook two critical aspects: 1) The appropriate trajectory prior can vary significantly across different driving scenarios. 2) Their trajectory evaluation mechanism lacks policy-driven refinement, remaining constrained by the limitations of one-stage supervised training. To address these issues, we explore improvements in two key areas. For problem 1, we employ MoE to apply different trajectory priors tailored to different scenarios. For problem 2, we utilize Reinforcement Learning to fine-tune the trajectory scoring mechanism. Additionally, we integrate models with different perception backbones to enhance perceptual features. Our integrated model achieved a score of 51.08 on the navsim ICCV benchmark, securing third place.

Paper Structure

This paper contains 8 sections, 10 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Overview of the TrajMoE architecture.