Table of Contents
Fetching ...

WeatherProof: Leveraging Language Guidance for Semantic Segmentation in Adverse Weather

Blake Gella, Howard Zhang, Rishi Upadhyay, Tiffany Chang, Nathan Wei, Matthew Waliman, Yunhao Ba, Celso de Melo, Alex Wong, Achuta Kadambi

TL;DR

This work addresses semantic segmentation under adverse weather by introducing WeatherProof, a paired clear/adverse weather dataset, and a language-guided robustness framework. It leverages a CLIP Injection Layer to inject weather-composition priors via cross-attention, using 20 weather-related prompts to bias feature representations toward weather-aware semantics. Across InternImage, ConvNeXt, and Swin backbones, the approach yields up to $10.2\%$ absolute gains in $mIoU$ on WeatherProof, $8.44\%$ on ACDC, and $3.9\%$ on the man-made A2I2-Haze dataset, while preserving clear-weather performance. These results demonstrate that vision-language priors can effectively narrow the solution space under complex degradations and suggest broader applicability to other robustness challenges and downstream tasks.

Abstract

We propose a method to infer semantic segmentation maps from images captured under adverse weather conditions. We begin by examining existing models on images degraded by weather conditions such as rain, fog, or snow, and found that they exhibit a large performance drop as compared to those captured under clear weather. To control for changes in scene structures, we propose WeatherProof, the first semantic segmentation dataset with accurate clear and adverse weather image pairs that share an underlying scene. Through this dataset, we analyze the error modes in existing models and found that they were sensitive to the highly complex combination of different weather effects induced on the image during capture. To improve robustness, we propose a way to use language as guidance by identifying contributions of adverse weather conditions and injecting that as "side information". Models trained using our language guidance exhibit performance gains by up to 10.2% in mIoU on WeatherProof, up to 8.44% in mIoU on the widely used ACDC dataset compared to standard training techniques, and up to 6.21% in mIoU on the ACDC dataset as compared to previous SOTA methods.

WeatherProof: Leveraging Language Guidance for Semantic Segmentation in Adverse Weather

TL;DR

This work addresses semantic segmentation under adverse weather by introducing WeatherProof, a paired clear/adverse weather dataset, and a language-guided robustness framework. It leverages a CLIP Injection Layer to inject weather-composition priors via cross-attention, using 20 weather-related prompts to bias feature representations toward weather-aware semantics. Across InternImage, ConvNeXt, and Swin backbones, the approach yields up to absolute gains in on WeatherProof, on ACDC, and on the man-made A2I2-Haze dataset, while preserving clear-weather performance. These results demonstrate that vision-language priors can effectively narrow the solution space under complex degradations and suggest broader applicability to other robustness challenges and downstream tasks.

Abstract

We propose a method to infer semantic segmentation maps from images captured under adverse weather conditions. We begin by examining existing models on images degraded by weather conditions such as rain, fog, or snow, and found that they exhibit a large performance drop as compared to those captured under clear weather. To control for changes in scene structures, we propose WeatherProof, the first semantic segmentation dataset with accurate clear and adverse weather image pairs that share an underlying scene. Through this dataset, we analyze the error modes in existing models and found that they were sensitive to the highly complex combination of different weather effects induced on the image during capture. To improve robustness, we propose a way to use language as guidance by identifying contributions of adverse weather conditions and injecting that as "side information". Models trained using our language guidance exhibit performance gains by up to 10.2% in mIoU on WeatherProof, up to 8.44% in mIoU on the widely used ACDC dataset compared to standard training techniques, and up to 6.21% in mIoU on the ACDC dataset as compared to previous SOTA methods.
Paper Structure (31 sections, 6 equations, 7 figures, 12 tables)

This paper contains 31 sections, 6 equations, 7 figures, 12 tables.

Figures (7)

  • Figure 1: By leveraging CLIP-based language guidance, our models perform up to 10.2% better on our WeatherProof test set, and 8.4% better on the widely used ACDC dataset as compared to standard fine-tuning procedures.
  • Figure 2: The train and test sets of WeatherProof include paired sets of varied combinations of weather effects.Top: Various types of weather effects and their compositions from the training set. Bottom: Weather effects and combinations in our test set. Change in mIoU between clear and degraded images of the InternImage baseline is shown in yellow. Note the significant impact on mIoU results of multiple combined weather effects.
  • Figure 3: WeatherProof dataset contains accurate clear and adverse weather image pairs with 10 semantic classes. The dataset includes rain, snow, and fog weather effects. The labels below the image are for the WeatherProof dataset. In contrast, the ACDC sakaridis2021acdc and IDD-AW shaik2024idd datasets' paired images either have major differences in semantic information and scene structure or are not in RGB space.
  • Figure 4: By using CLIP-based language guidance, models are able to generate features that are more resilient to adverse weather conditions. During training, a CLIP-Guided Injection module learns a CLIP-informed prior representing the adverse weather effect in the CLIP latent space. This is concatenated with the image latent before being fed in through cross-attention layers into the model.
  • Figure 5: Our CLIP injection layer is able to accurately predict the composition of weather effects in images. The percentage of weather effect contributions was taken by passing in these images into our CLIP injection layer and extracting the weights $\hbox{\boldmath$v$}$.
  • ...and 2 more figures