Table of Contents
Fetching ...

From Events to Clarity: The Event-Guided Diffusion Framework for Dehazing

Ling Wang, Yunfan Lu, Wenzong Ma, Huizai Yao, Pengteng Li, Hui Xiong

TL;DR

This work tackles dehazing under dense haze by leveraging high-dynamic-range information from synchronized event cameras. It reframes dehazing as conditional image generation within a latent diffusion framework, injecting sparse but informative event features into the denoising process via cross-attention and a Temporal Pyramid Representation. The proposed EvDehaze comprises a frozen VQ-VAE backbone, an efficient Events Representation Model, and an Event-Guided Diffusion Module, achieving state-of-the-art performance among diffusion-based methods and enhancing perceptual realism in challenging scenarios. A real-world RGB–event drone dataset under heavy haze substantiates practical applicability, while ablations confirm the critical role of event guidance and cross-attention in preserving structure and contrast.

Abstract

Clear imaging under hazy conditions is a critical task. Prior-based and neural methods have improved results. However, they operate on RGB frames, which suffer from limited dynamic range. Therefore, dehazing remains ill-posed and can erase structure and illumination details. To address this, we use event cameras for dehazing for the \textbf{first time}. Event cameras offer much higher HDR ($120 dBvs.60 dB$) and microsecond latency, therefore they suit hazy scenes. In practice, transferring HDR cues from events to frames is hard because real paired data are scarce. To tackle this, we propose an event-guided diffusion model that utilizes the strong generative priors of diffusion models to reconstruct clear images from hazy inputs by effectively transferring HDR information from events. Specifically, we design an event-guided module that maps sparse HDR event features, \textit{e.g.,} edges, corners, into the diffusion latent space. This clear conditioning provides precise structural guidance during generation, improves visual realism, and reduces semantic drift. For real-world evaluation, we collect a drone dataset in heavy haze (AQI = 341) with synchronized RGB and event sensors. Experiments on two benchmarks and our dataset achieve state-of-the-art results.

From Events to Clarity: The Event-Guided Diffusion Framework for Dehazing

TL;DR

This work tackles dehazing under dense haze by leveraging high-dynamic-range information from synchronized event cameras. It reframes dehazing as conditional image generation within a latent diffusion framework, injecting sparse but informative event features into the denoising process via cross-attention and a Temporal Pyramid Representation. The proposed EvDehaze comprises a frozen VQ-VAE backbone, an efficient Events Representation Model, and an Event-Guided Diffusion Module, achieving state-of-the-art performance among diffusion-based methods and enhancing perceptual realism in challenging scenarios. A real-world RGB–event drone dataset under heavy haze substantiates practical applicability, while ablations confirm the critical role of event guidance and cross-attention in preserving structure and contrast.

Abstract

Clear imaging under hazy conditions is a critical task. Prior-based and neural methods have improved results. However, they operate on RGB frames, which suffer from limited dynamic range. Therefore, dehazing remains ill-posed and can erase structure and illumination details. To address this, we use event cameras for dehazing for the \textbf{first time}. Event cameras offer much higher HDR () and microsecond latency, therefore they suit hazy scenes. In practice, transferring HDR cues from events to frames is hard because real paired data are scarce. To tackle this, we propose an event-guided diffusion model that utilizes the strong generative priors of diffusion models to reconstruct clear images from hazy inputs by effectively transferring HDR information from events. Specifically, we design an event-guided module that maps sparse HDR event features, \textit{e.g.,} edges, corners, into the diffusion latent space. This clear conditioning provides precise structural guidance during generation, improves visual realism, and reduces semantic drift. For real-world evaluation, we collect a drone dataset in heavy haze (AQI = 341) with synchronized RGB and event sensors. Experiments on two benchmarks and our dataset achieve state-of-the-art results.

Paper Structure

This paper contains 11 sections, 16 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Overview of the motivation and the proposed EvDehaze method. Haze compresses the dynamic range of conventional RGB frames $I_i$ (see Sec. \ref{['sec:preliminary']}) captured in heavy pollution (AQI = 341), as illustrated in (a)-right. In contrast, events $E$ captured by event cameras offer a much higher dynamic range (up to $120 dB$), as illustrated in (a)-left, and thus provide critical cues under haze. Our method, EvDehaze, uses this events to guide dehazing (Sec. \ref{['sec:method']}), producing clearer outputs $I_o$ as shown in (c).
  • Figure 2: Overview of the proposed EvDehaze framework. Given a hazy frame $I_i$ and its corresponding event stream $E$, our model generates a dehazed output $I_o$. The pipeline consists of three main components: (1) a frozen VQ-VAE razavi2019generating for encoding and decoding image latents; (2) an Event Encoder that extracts multi-scale representations from $E$ via TPR and Conv+Pooling; and (3) a denoising U-Net that performs iterative refinement from noise $x_T$ to clean latent $x_0$, guided by event features via cross-attention.
  • Figure 3: Our real-world data acquisition system mounted on the DJI Matrice 300 RTK platform. The payload includes a PROPHESSEE EVK4 HD event camera and an Intel RealSense D435i RGB camera with closely aligned viewpoints. The setup enables synchronized RGB-event recording in outdoor hazy conditions for long-range imaging analysis.
  • Figure 4: Qualitative comparison on the RESIDE-ITS dataset li2018benchmarking. (a) input frames, (b) DehazeFormer-B, (c) Restormer, (d) ResShift, (e) ours, and (f) ground truth. Compared with RGB-only methods, EvDehaze recovers sharper edges and more consistent contrast near windows, blinds and outdoor structures, while reducing residual haze and color distortion, and is visually closer to the ground truth.
  • Figure 5: Real-world results on our RGB–event drone dataset under heavy haze. (a) and (c) are hazy inputs with overlaid event activity, and (b) and (d) are the dehazed outputs of our event-guided diffusion model EvDehaze. Our method removes large-scale haze and recovers clearer edges and textures in distant buildings and roads; the histogram (right) shows an expanded intensity range and improved global contrast compared with the input.
  • ...and 3 more figures