Table of Contents
Fetching ...

WeatherDG: LLM-assisted Diffusion Model for Procedural Weather Generation in Domain-Generalized Semantic Segmentation

Chenghao Qian, Yuhu Guo, Yuhong Mo, Wenjing Li

TL;DR

WeatherDG tackles domain generalization for semantic segmentation under adverse weather by integrating a fine-tuned Stable Diffusion model with a chain of LLM agents to generate realistic, weather-diverse driving scenes. The approach comprises SD fine-tuning to inject driving-scene priors, procedural prompt generation with instance sampling, scene composition, and scene description, and UDA-based training to leverage synthetic data. Key contributions include the driving-scene priors alignment, a three-agent prompt-generation framework with a balanced sampling strategy for tailed classes, and demonstrated improvements across Cityscapes→ACDC, BDD100K, and DarkZurich. The framework is model-agnostic and yields substantial performance gains, highlighting its practical impact for improving robustness of autonomous driving perception in varied weather conditions.

Abstract

In this work, we propose a novel approach, namely WeatherDG, that can generate realistic, weather-diverse, and driving-screen images based on the cooperation of two foundation models, i.e, Stable Diffusion (SD) and Large Language Model (LLM). Specifically, we first fine-tune the SD with source data, aligning the content and layout of generated samples with real-world driving scenarios. Then, we propose a procedural prompt generation method based on LLM, which can enrich scenario descriptions and help SD automatically generate more diverse, detailed images. In addition, we introduce a balanced generation strategy, which encourages the SD to generate high-quality objects of tailed classes under various weather conditions, such as riders and motorcycles. This segmentation-model-agnostic method can improve the generalization ability of existing models by additionally adapting them with the generated synthetic data. Experiments on three challenging datasets show that our method can significantly improve the segmentation performance of different state-of-the-art models on target domains. Notably, in the setting of ''Cityscapes to ACDC'', our method improves the baseline HRDA by 13.9% in mIoU.

WeatherDG: LLM-assisted Diffusion Model for Procedural Weather Generation in Domain-Generalized Semantic Segmentation

TL;DR

WeatherDG tackles domain generalization for semantic segmentation under adverse weather by integrating a fine-tuned Stable Diffusion model with a chain of LLM agents to generate realistic, weather-diverse driving scenes. The approach comprises SD fine-tuning to inject driving-scene priors, procedural prompt generation with instance sampling, scene composition, and scene description, and UDA-based training to leverage synthetic data. Key contributions include the driving-scene priors alignment, a three-agent prompt-generation framework with a balanced sampling strategy for tailed classes, and demonstrated improvements across Cityscapes→ACDC, BDD100K, and DarkZurich. The framework is model-agnostic and yields substantial performance gains, highlighting its practical impact for improving robustness of autonomous driving perception in varied weather conditions.

Abstract

In this work, we propose a novel approach, namely WeatherDG, that can generate realistic, weather-diverse, and driving-screen images based on the cooperation of two foundation models, i.e, Stable Diffusion (SD) and Large Language Model (LLM). Specifically, we first fine-tune the SD with source data, aligning the content and layout of generated samples with real-world driving scenarios. Then, we propose a procedural prompt generation method based on LLM, which can enrich scenario descriptions and help SD automatically generate more diverse, detailed images. In addition, we introduce a balanced generation strategy, which encourages the SD to generate high-quality objects of tailed classes under various weather conditions, such as riders and motorcycles. This segmentation-model-agnostic method can improve the generalization ability of existing models by additionally adapting them with the generated synthetic data. Experiments on three challenging datasets show that our method can significantly improve the segmentation performance of different state-of-the-art models on target domains. Notably, in the setting of ''Cityscapes to ACDC'', our method improves the baseline HRDA by 13.9% in mIoU.

Paper Structure

This paper contains 21 sections, 3 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Visualization of domain-generalized semantic segmentation results: MIChoyer2023mic (red box) vs. WeatherDG (green box). The tested images include foggy, nighttime, snowy, and rainy scenarios.
  • Figure 2: Comparison between synthetic and real-world images under adverse weather conditions. The results reveal that images generated by (a) driving simulator kerim2022Semantic lack intricate details and natural lighting, whereas (b) Stable Diffusion rombach2021highresolution typically produces images with an artistic flair. In contrast, (c) our method produces the most realistic images, closely resembling (d) diverse real-world driving scenes.
  • Figure 3: The overview of WeatherDG pipeline. (a) We first fine-tune a text-to-image diffusion model to integrate scene priors from the source domain. This ensures the images generated by the diffusion model accurately depict driving scenes. (b) Next, we employ a chain of LLM agents to procedurally construct detailed prompts that can enrich tailed classes and generate diverse weather and lighting effects with the fine-tuned model. (c) After generating images with these prompts, we utilize UDA techniques to train these images with the source domain dataset, followed by evaluation on real-world target datasets.
  • Figure 4: The detailed process of Stable Diffusion rombach2021highresolution fine-tuning.
  • Figure 5: The process of prompt generation by gradually introducing LLM agents: $\mathcal{E}_{\mathit{IS}}$, $\mathcal{E}_{\mathit{SC}}$ and $\mathcal{E}_{\mathit{SD}}$. The images correspond to the results generated using prompts created by different LLM agents.
  • ...and 3 more figures