Learning A Physical-aware Diffusion Model Based on Transformer for Underwater Image Enhancement
Chen Zhao, Chenyu Dong, Weiling Cai
TL;DR
This work tackles underwater image enhancement (UIE) by integrating physical underwater imaging mechanisms into diffusion-based restoration. The authors propose PA-Diff, a three-branch framework comprising Physics Prior Generation (PPG) for transmission and background-light priors, Implicit Neural Reconstruction (INR) for robust feature representations, and a Physics-aware Diffusion Transformer (PDT) that fuses priors with diffusion via physics-guided self-attention, cross-attention, and a physics perception unit. Key contributions include the first physics-guided diffusion model for UIE, the plug-and-play Physics Perception Unit (PPU), and extensive ablations showing each component’s effectiveness across real-world underwater datasets, achieving state-of-the-art results. The approach enhances not only quantitative metrics but also perceptual quality, improving robustness across varied underwater scenes and supporting downstream vision tasks in underwater environments.
Abstract
Underwater visuals undergo various complex degradations, inevitably influencing the efficiency of underwater vision tasks. Recently, diffusion models were employed to underwater image enhancement (UIE) tasks, and gained SOTA performance. However, these methods fail to consider the physical properties and underwater imaging mechanisms in the diffusion process, limiting information completion capacity of diffusion models. In this paper, we introduce a novel UIE framework, named PA-Diff, designed to exploiting the knowledge of physics to guide the diffusion process. PA-Diff consists of Physics Prior Generation (PPG) Branch, Implicit Neural Reconstruction (INR) Branch, and Physics-aware Diffusion Transformer (PDT) Branch. Our designed PPG branch aims to produce the prior knowledge of physics. With utilizing the physics prior knowledge to guide the diffusion process, PDT branch can obtain underwater-aware ability and model the complex distribution in real-world underwater scenes. INR Branch can learn robust feature representations from diverse underwater image via implicit neural representation, which reduces the difficulty of restoration for PDT branch. Extensive experiments prove that our method achieves best performance on UIE tasks.
