Table of Contents
Fetching ...

MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection

Youngmin Oh, Hyung-Il Kim, Seong Tae Kim, Jung Uk Kim

TL;DR

Monocular 3D object detection often fails under adverse weather like fog. The authors propose MonoWAD, a weather-robust detector that combines a weather codebook to memorize clear-weather knowledge and generate weather-reference features with a weather-adaptive diffusion model that uses the fog distribution $\mathcal{F}=x^f-x^c$ to progressively enhance feature representations. Two losses, $\mathcal{L}_{ckr}$ and $\mathcal{L}_{wae}$, guide the codebook and diffusion model, and standard detection loss $\mathcal{L}_{OD}$ completes the end-to-end objective $\mathcal{L}_{Total}$. Experiments on KITTI, Foggy KITTI, and Virtual KITTI show improved weather robustness, outperforming state-of-the-art monocular detectors under foggy and mixed conditions, with code and data released for reproducibility. This approach offers a practical path toward reliable perception in real-world autonomous systems across diverse weather scenarios.

Abstract

Monocular 3D object detection is an important challenging task in autonomous driving. Existing methods mainly focus on performing 3D detection in ideal weather conditions, characterized by scenarios with clear and optimal visibility. However, the challenge of autonomous driving requires the ability to handle changes in weather conditions, such as foggy weather, not just clear weather. We introduce MonoWAD, a novel weather-robust monocular 3D object detector with a weather-adaptive diffusion model. It contains two components: (1) the weather codebook to memorize the knowledge of the clear weather and generate a weather-reference feature for any input, and (2) the weather-adaptive diffusion model to enhance the feature representation of the input feature by incorporating a weather-reference feature. This serves an attention role in indicating how much improvement is needed for the input feature according to the weather conditions. To achieve this goal, we introduce a weather-adaptive enhancement loss to enhance the feature representation under both clear and foggy weather conditions. Extensive experiments under various weather conditions demonstrate that MonoWAD achieves weather-robust monocular 3D object detection. The code and dataset are released at https://github.com/VisualAIKHU/MonoWAD.

MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection

TL;DR

Monocular 3D object detection often fails under adverse weather like fog. The authors propose MonoWAD, a weather-robust detector that combines a weather codebook to memorize clear-weather knowledge and generate weather-reference features with a weather-adaptive diffusion model that uses the fog distribution to progressively enhance feature representations. Two losses, and , guide the codebook and diffusion model, and standard detection loss completes the end-to-end objective . Experiments on KITTI, Foggy KITTI, and Virtual KITTI show improved weather robustness, outperforming state-of-the-art monocular detectors under foggy and mixed conditions, with code and data released for reproducibility. This approach offers a practical path toward reliable perception in real-world autonomous systems across diverse weather scenarios.

Abstract

Monocular 3D object detection is an important challenging task in autonomous driving. Existing methods mainly focus on performing 3D detection in ideal weather conditions, characterized by scenarios with clear and optimal visibility. However, the challenge of autonomous driving requires the ability to handle changes in weather conditions, such as foggy weather, not just clear weather. We introduce MonoWAD, a novel weather-robust monocular 3D object detector with a weather-adaptive diffusion model. It contains two components: (1) the weather codebook to memorize the knowledge of the clear weather and generate a weather-reference feature for any input, and (2) the weather-adaptive diffusion model to enhance the feature representation of the input feature by incorporating a weather-reference feature. This serves an attention role in indicating how much improvement is needed for the input feature according to the weather conditions. To achieve this goal, we introduce a weather-adaptive enhancement loss to enhance the feature representation under both clear and foggy weather conditions. Extensive experiments under various weather conditions demonstrate that MonoWAD achieves weather-robust monocular 3D object detection. The code and dataset are released at https://github.com/VisualAIKHU/MonoWAD.
Paper Structure (44 sections, 13 equations, 13 figures, 12 tables)

This paper contains 44 sections, 13 equations, 13 figures, 12 tables.

Figures (13)

  • Figure 1: Conceptual diagram of the proposed method (foggy example). (a) In the training phase, weather codebook learns the clear knowledge to transfer it to weather-adaptive diffusion model to enhance content related to the weather conditions. (b) By doing so, even with input images under various weather conditions (e.g., foggy images), monocular 3D object detection becomes adaptable to various weather scenarios.
  • Figure 2: Overview of our MonoWAD in the inference phase. It mainly contains three parts: weather codebook, weather-adaptive diffusion model, and detection block. Through the weather codebook and weather-adaptive diffusion model, our method can maintain robustness against various weather conditions (i.e., clear or foggy).
  • Figure 3: Illustration of the proposed (a) weather-invariant guiding (WIG) loss and (b) clear knowledge embedding (CKE) loss. The clear knowledge recalling (CKR) loss, obtained from combining WIG and CKE, aims to memorize the knowledge of the clear weather and recall the same clear knowledge from the foggy weather.
  • Figure 4: Training process of the weather-adaptive diffusion model, which consists of two processes: (1) Adding fog variant $\boldsymbol{\epsilon}_n$ from input clear feature $x^c$ (forward process) and (2) enhancing representation with weather-reference feature $x^r$ (reverse process).
  • Figure 5: Comparison of 3D detection examples (green: ground-truth, red: predicted 3D bounding-box) between our MonoWAD and two detectors, MonoDTR MonoDTR and MonoDETR MonoDETR, that show the most improved performances among existing methods.
  • ...and 8 more figures