AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection

Hongwei Lin; Xun Huang; Chenglu Wen; Cheng Wang

AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection

Hongwei Lin, Xun Huang, Chenglu Wen, Cheng Wang

Abstract

Robust 3D object detection under adverse weather conditions is crucial for autonomous driving. However, most existing methods simply combine all weather samples for training while overlooking data distribution discrepancies across different weather scenarios, leading to performance conflicts. To address this issue, we introduce AW-MoE, the framework that innovatively integrates Mixture of Experts (MoE) into weather-robust multi-modal 3D object detection approaches. AW-MoE incorporates Image-guided Weather-aware Routing (IWR), which leverages the superior discriminability of image features across weather conditions and their invariance to scene variations for precise weather classification. Based on this accurate classification, IWR selects the top-K most relevant Weather-Specific Experts (WSE) that handle data discrepancies, ensuring optimal detection under all weather conditions. Additionally, we propose a Unified Dual-Modal Augmentation (UDMA) for synchronous LiDAR and 4D Radar dual-modal data augmentation while preserving the realism of scenes. Extensive experiments on the real-world dataset demonstrate that AW-MoE achieves ~ 15% improvement in adverse-weather performance over state-of-the-art methods, while incurring negligible inference overhead. Moreover, integrating AW-MoE into established baseline detectors yields performance improvements surpassing current state-of-the-art methods. These results show the effectiveness and strong scalability of our AW-MoE. We will release the code publicly at https://github.com/windlinsherlock/AW-MoE.

AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection

Abstract

Paper Structure (33 sections, 9 equations, 8 figures, 10 tables, 1 algorithm)

This paper contains 33 sections, 9 equations, 8 figures, 10 tables, 1 algorithm.

Introduction
Related Work
3D Object Detection.
3D Object Detection Under Adverse Weather.
Mixture of Experts (MoE).
Proposed method
Problem Formulation
AW-MoE
Unified Dual-Modal Augmentation
Image-guided Weather-aware Routing
Weather-Specific Experts
AW-MoE-LRC: Integrating Image Features
LiDAR-Guided Image Feature Lifting
3D Geometry Transformation and BEV Pooling (Splatting)
Multi-Modal Feature Fusion
...and 18 more sections

Figures (8)

Figure 1: Comparison of weather-type discriminability between camera images and LiDAR point clouds. (a, b) Camera images exhibit distinct visual characteristics and robustness to scene variations, facilitating accurate weather classification. (c, d) In contrast, LiDAR point clouds suffer from ambiguous inter-class geometric distortions and scene-induced intra-class distribution shifts, which obscure the boundaries between different weather conditions.
Figure 2: (a) Performance changes of L4DR l4dr after fine-tuning on a single weather condition under different weather scenarios. (b) Statistics of data volume across different weather conditions in the K-Radar dataset k-radar.
Figure 3: Method comparison between Point-cloud Feature-based Routing (PFR) and the proposed Image-guided Weather-aware Routing (IWR).
Figure 4: AW-MoE Framework. (a) Unified Dual-Modal Augmentation (UDMA): Synchronously augments LiDAR and 4D Radar point clouds. Its GT Sampling only selects ground truths matching the scene's weather. (b) Image-guided Weather-aware Routing (IWR): Uses an Image-based Weather Classifier to predict the scene weather and routes the feature to the top-K most relevant Weather-Specific Experts. (c) Weather-Specific Experts (WSE): Each expert is specialized for a weather condition, extracting robust weather-specific features and regressing bounding boxes with tailored sensitivity.
Figure 5: The architecture of the AW-MoE-LRC framework. The pipeline comprises three stages: (i) LiDAR-Guided Image Feature Lifting, where sparse LiDAR depth assists in predicting 3D frustum features from images; (ii) 3D Geometry Transformation and BEV Pooling, which projects and aggregates these features into the ego-vehicle BEV space; and (iii) Multi-Modal Feature Fusion, which concatenates the aligned camera, LiDAR, and 4D Radar BEV features along the channel dimension for final convolution-based integration.
...and 3 more figures

AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection

Abstract

AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection

Authors

Abstract

Table of Contents

Figures (8)