Table of Contents
Fetching ...

ExpertAD: Enhancing Autonomous Driving Systems with Mixture of Experts

Haowen Jiang, Xinyu Huang, You Lu, Dingji Wang, Yuheng Cao, Chaofeng Sha, Bihuan Chen, Keyu Chen, Xin Peng

TL;DR

ExpertAD targets the persistent latency and interference challenges in end-to-end autonomous driving systems by introducing a Perception Adapter (PA) and a Mixture of Sparse Experts (MoSE) that span perception and prediction. The PA dynamically emphasizes task-critical BEV features, while MoSE uses eight specialized sparse-attention experts across Environmental, Ego State, and Navigation tasks, gated by a Router to activate only the most relevant experts. Through joint training with a switch-friendly loss, ExpertAD achieves notable gains in planning effectiveness and inference efficiency across multiple vision-only ADS baselines, and demonstrates strong generalization to unseen urban environments and useful qualitative case studies. The results support ExpertAD as a practical approach to improve safety and efficiency in autonomous driving by reducing collision rates and latency without sacrificing planning quality.

Abstract

Recent advancements in end-to-end autonomous driving systems (ADSs) underscore their potential for perception and planning capabilities. However, challenges remain. Complex driving scenarios contain rich semantic information, yet ambiguous or noisy semantics can compromise decision reliability, while interference between multiple driving tasks may hinder optimal planning. Furthermore, prolonged inference latency slows decision-making, increasing the risk of unsafe driving behaviors. To address these challenges, we propose ExpertAD, a novel framework that enhances the performance of ADS with Mixture of Experts (MoE) architecture. We introduce a Perception Adapter (PA) to amplify task-critical features, ensuring contextually relevant scene understanding, and a Mixture of Sparse Experts (MoSE) to minimize task interference during prediction, allowing for effective and efficient planning. Our experiments show that ExpertAD reduces average collision rates by up to 20% and inference latency by 25% compared to prior methods. We further evaluate its multi-skill planning capabilities in rare scenarios (e.g., accidents, yielding to emergency vehicles) and demonstrate strong generalization to unseen urban environments. Additionally, we present a case study that illustrates its decision-making process in complex driving scenarios.

ExpertAD: Enhancing Autonomous Driving Systems with Mixture of Experts

TL;DR

ExpertAD targets the persistent latency and interference challenges in end-to-end autonomous driving systems by introducing a Perception Adapter (PA) and a Mixture of Sparse Experts (MoSE) that span perception and prediction. The PA dynamically emphasizes task-critical BEV features, while MoSE uses eight specialized sparse-attention experts across Environmental, Ego State, and Navigation tasks, gated by a Router to activate only the most relevant experts. Through joint training with a switch-friendly loss, ExpertAD achieves notable gains in planning effectiveness and inference efficiency across multiple vision-only ADS baselines, and demonstrates strong generalization to unseen urban environments and useful qualitative case studies. The results support ExpertAD as a practical approach to improve safety and efficiency in autonomous driving by reducing collision rates and latency without sacrificing planning quality.

Abstract

Recent advancements in end-to-end autonomous driving systems (ADSs) underscore their potential for perception and planning capabilities. However, challenges remain. Complex driving scenarios contain rich semantic information, yet ambiguous or noisy semantics can compromise decision reliability, while interference between multiple driving tasks may hinder optimal planning. Furthermore, prolonged inference latency slows decision-making, increasing the risk of unsafe driving behaviors. To address these challenges, we propose ExpertAD, a novel framework that enhances the performance of ADS with Mixture of Experts (MoE) architecture. We introduce a Perception Adapter (PA) to amplify task-critical features, ensuring contextually relevant scene understanding, and a Mixture of Sparse Experts (MoSE) to minimize task interference during prediction, allowing for effective and efficient planning. Our experiments show that ExpertAD reduces average collision rates by up to 20% and inference latency by 25% compared to prior methods. We further evaluate its multi-skill planning capabilities in rare scenarios (e.g., accidents, yielding to emergency vehicles) and demonstrate strong generalization to unseen urban environments. Additionally, we present a case study that illustrates its decision-making process in complex driving scenarios.

Paper Structure

This paper contains 18 sections, 10 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Collision rate and latency trade-offs across different models. ExpertAD exhibits substantial improvements in planning effectiveness while reducing inference latency, as measured on NVIDIA GeForce RTX 3090.
  • Figure 2: Overall architecture of ExpertAD. ExpertAD is built upon ADS models by retaining all the original modules except for the perception and prediction modules. The perception module is restructured as the Perception Adapter (PA) to amplify task-critical features, enhancing scene understanding. The prediction module is transformed into the Mixture of Sparse Experts (MoSE), minimizing interference among driving tasks and improving overall open-loop planning performance.
  • Figure 3: ExpertAD recovery visualization. (a) UniAD misses a traffic officer on the right, leading the vehicle to drift toward them. (b) Expert-UniAD detects the officer, adjusts the trajectory, and completes the lane change safely.