Table of Contents
Fetching ...

Statistic-Augmented, Decoupled MoE Routing and Aggregating in Autonomous Driving

Wei-Bin Kou, Guangxu Zhu, Jingreng Lei, Chen Zhang, Yik-Chung Wu, Jianping Wang

TL;DR

Autonomous driving requires adaptable perception under diverse conditions; this work introduces MoE-RAM, a statistic-augmented, decoupled MoE routing and aggregating framework driven by a pretrained ViT backbone. MoE-RAM employs expert-wise FRLs and Jensen-Shannon-based retrieval/conditioning to route and weight expert outputs, enabling precise, scenario-aware fusion. End-to-end training combines cross-entropy with load-balancing and FRL regularization, and experiments across Cityscapes, CamVid, Apolloscapes, and CARLA_ADV show superior performance and faster convergence versus existing MoE baselines and single-model approaches. The results, along with ablations and visualizations, demonstrate the practical potential of statistically guided MoE routing and aggregation for robust AD semantic segmentation.

Abstract

Autonomous driving (AD) scenarios are inherently complex and diverse, posing significant challenges for a single deep learning model to effectively cover all possible conditions, such as varying weather, traffic densities, and road types. Large Model (LM)-Driven Mixture of Experts (MoE) paradigm offers a promising solution, where LM serves as the backbone to extract latent features while MoE serves as the downstream head to dynamically select and aggregate specialized experts to adapt to different scenarios. However, routing and aggregating in MoE face intrinsic challenges, including imprecise expert selection due to flawed routing strategy and inefficient expert aggregation leading to suboptimal prediction. To address these issues, we propose a statistic-augmented, decoupled MoE }outing and Aggregating Mechanism (MoE-RAM) driven by LM. Specifically, on the one hand, MoE-RAM enhances expert routing by incorporating statistical retrieval mechanism to match LM-extracted latent features with cached prototypical features of the most relevant experts; on the other hand, MoE-RAM adaptively reweights experts' outputs in fusion by measuring statistical distances of experts' instant features against LM-extracted latent features. Benefiting from the synergy of the statistic-augmented MoE's routing and aggregating, MoE-RAM ultimately improves the prediction performance. We take the AD semantic segmentation task as an example to assess the proposed MoE-RAM. Extensive experiments on AD datasets demonstrate the superiority of MoE-RAM compared to other MoE baselines and conventional single-model approaches.

Statistic-Augmented, Decoupled MoE Routing and Aggregating in Autonomous Driving

TL;DR

Autonomous driving requires adaptable perception under diverse conditions; this work introduces MoE-RAM, a statistic-augmented, decoupled MoE routing and aggregating framework driven by a pretrained ViT backbone. MoE-RAM employs expert-wise FRLs and Jensen-Shannon-based retrieval/conditioning to route and weight expert outputs, enabling precise, scenario-aware fusion. End-to-end training combines cross-entropy with load-balancing and FRL regularization, and experiments across Cityscapes, CamVid, Apolloscapes, and CARLA_ADV show superior performance and faster convergence versus existing MoE baselines and single-model approaches. The results, along with ablations and visualizations, demonstrate the practical potential of statistically guided MoE routing and aggregation for robust AD semantic segmentation.

Abstract

Autonomous driving (AD) scenarios are inherently complex and diverse, posing significant challenges for a single deep learning model to effectively cover all possible conditions, such as varying weather, traffic densities, and road types. Large Model (LM)-Driven Mixture of Experts (MoE) paradigm offers a promising solution, where LM serves as the backbone to extract latent features while MoE serves as the downstream head to dynamically select and aggregate specialized experts to adapt to different scenarios. However, routing and aggregating in MoE face intrinsic challenges, including imprecise expert selection due to flawed routing strategy and inefficient expert aggregation leading to suboptimal prediction. To address these issues, we propose a statistic-augmented, decoupled MoE }outing and Aggregating Mechanism (MoE-RAM) driven by LM. Specifically, on the one hand, MoE-RAM enhances expert routing by incorporating statistical retrieval mechanism to match LM-extracted latent features with cached prototypical features of the most relevant experts; on the other hand, MoE-RAM adaptively reweights experts' outputs in fusion by measuring statistical distances of experts' instant features against LM-extracted latent features. Benefiting from the synergy of the statistic-augmented MoE's routing and aggregating, MoE-RAM ultimately improves the prediction performance. We take the AD semantic segmentation task as an example to assess the proposed MoE-RAM. Extensive experiments on AD datasets demonstrate the superiority of MoE-RAM compared to other MoE baselines and conventional single-model approaches.

Paper Structure

This paper contains 25 sections, 12 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: Illustration of the proposed MoE-RAM.
  • Figure 2: Convergence comparison of the proposed MoE-RAM against other MoE baselines and single-model approaches.
  • Figure 3: Relationship visualization between ViT-extracted features and expert-wise FRL prototypes.