Table of Contents
Fetching ...

From Filters to VLMs: Benchmarking Defogging Methods through Object Detection and Segmentation Performance

Ardalan Aryashad, Parsa Razmara, Amin Mahjoub, Seyedarmin Azizi, Mahdi Salmani, Arad Firouzkouhi

TL;DR

The study benchmarks a broad spectrum of defogging pipelines—ranging from handcrafted filters to learned restorers, chained variants, and VLM-driven edits—using Foggy Cityscapes to assess downstream object detection and segmentation under fog. By fixing the detector/segmenter and varying only preprocessing, it reveals that model→filter chains often outperform filter→model configurations and that prompt-guided VLM edits can approach ground-truth performance in some cases, though they remain inconsistent across real-world fog. A CoT prompt for VLM editors improves perceptual cues correlated with detection gains, with qualitative rubric scores strongly aligning with mAP (r ≈ 0.94). The work provides a transparent, end-to-end benchmark and highlights domain-transfer gaps between synthetic and real fog, underscoring the need for real fog data and task-driven evaluation to ensure robust perception in adverse weather.

Abstract

Autonomous driving perception systems are particularly vulnerable in foggy conditions, where light scattering reduces contrast and obscures fine details critical for safe operation. While numerous defogging methods exist-from handcrafted filters to learned restoration models-improvements in image fidelity do not consistently translate into better downstream detection and segmentation. Moreover, prior evaluations often rely on synthetic data, leaving questions about real-world transferability. We present a structured empirical study that benchmarks a comprehensive set of pipelines, including (i) classical filters, (ii) modern defogging networks, (iii) chained variants (filter$\rightarrow$model, model$\rightarrow$filter), and (iv) prompt-driven visual--language image editing models (VLM) applied directly to foggy images. Using Foggy Cityscapes, we assess both image quality and downstream performance on object detection (mAP) and segmentation (PQ, RQ, SQ). Our analysis reveals when defogging helps, when chaining yields synergy or degradation, and how VLM-based editors compare to dedicated approaches. In addition, we evaluate qualitative rubric-based scores from a VLM judge and quantify their alignment with task metrics, showing strong correlations with mAP. Together, these results establish a transparent, task-oriented benchmark for defogging methods and highlight the conditions under which preprocessing genuinely improves autonomous perception in adverse weather.

From Filters to VLMs: Benchmarking Defogging Methods through Object Detection and Segmentation Performance

TL;DR

The study benchmarks a broad spectrum of defogging pipelines—ranging from handcrafted filters to learned restorers, chained variants, and VLM-driven edits—using Foggy Cityscapes to assess downstream object detection and segmentation under fog. By fixing the detector/segmenter and varying only preprocessing, it reveals that model→filter chains often outperform filter→model configurations and that prompt-guided VLM edits can approach ground-truth performance in some cases, though they remain inconsistent across real-world fog. A CoT prompt for VLM editors improves perceptual cues correlated with detection gains, with qualitative rubric scores strongly aligning with mAP (r ≈ 0.94). The work provides a transparent, end-to-end benchmark and highlights domain-transfer gaps between synthetic and real fog, underscoring the need for real fog data and task-driven evaluation to ensure robust perception in adverse weather.

Abstract

Autonomous driving perception systems are particularly vulnerable in foggy conditions, where light scattering reduces contrast and obscures fine details critical for safe operation. While numerous defogging methods exist-from handcrafted filters to learned restoration models-improvements in image fidelity do not consistently translate into better downstream detection and segmentation. Moreover, prior evaluations often rely on synthetic data, leaving questions about real-world transferability. We present a structured empirical study that benchmarks a comprehensive set of pipelines, including (i) classical filters, (ii) modern defogging networks, (iii) chained variants (filtermodel, modelfilter), and (iv) prompt-driven visual--language image editing models (VLM) applied directly to foggy images. Using Foggy Cityscapes, we assess both image quality and downstream performance on object detection (mAP) and segmentation (PQ, RQ, SQ). Our analysis reveals when defogging helps, when chaining yields synergy or degradation, and how VLM-based editors compare to dedicated approaches. In addition, we evaluate qualitative rubric-based scores from a VLM judge and quantify their alignment with task metrics, showing strong correlations with mAP. Together, these results establish a transparent, task-oriented benchmark for defogging methods and highlight the conditions under which preprocessing genuinely improves autonomous perception in adverse weather.

Paper Structure

This paper contains 16 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Example showing how defogging improves downstream object detection by restoring visibility of fog-obscured objects.
  • Figure 2: Comparison of defogging results on a real foggy image. The model trained on synthetic fog (DehazeFormer) shows color distortion and overcorrection, while the Flux CoT model better restores object visibility and preserves natural appearance, highlighting the domain gap between synthetic and real fog.
  • Figure 3: Correlation of mAP and qualitative scores across models. Ground truth provides the upper bound for both metrics. The two measures are strongly correlated ($r = 0.94$), indicating that qualitative evaluation aligns closely with quantitative detection performance.