Table of Contents
Fetching ...

Learning A Physical-aware Diffusion Model Based on Transformer for Underwater Image Enhancement

Chen Zhao, Chenyu Dong, Weiling Cai

TL;DR

This work tackles underwater image enhancement (UIE) by integrating physical underwater imaging mechanisms into diffusion-based restoration. The authors propose PA-Diff, a three-branch framework comprising Physics Prior Generation (PPG) for transmission and background-light priors, Implicit Neural Reconstruction (INR) for robust feature representations, and a Physics-aware Diffusion Transformer (PDT) that fuses priors with diffusion via physics-guided self-attention, cross-attention, and a physics perception unit. Key contributions include the first physics-guided diffusion model for UIE, the plug-and-play Physics Perception Unit (PPU), and extensive ablations showing each component’s effectiveness across real-world underwater datasets, achieving state-of-the-art results. The approach enhances not only quantitative metrics but also perceptual quality, improving robustness across varied underwater scenes and supporting downstream vision tasks in underwater environments.

Abstract

Underwater visuals undergo various complex degradations, inevitably influencing the efficiency of underwater vision tasks. Recently, diffusion models were employed to underwater image enhancement (UIE) tasks, and gained SOTA performance. However, these methods fail to consider the physical properties and underwater imaging mechanisms in the diffusion process, limiting information completion capacity of diffusion models. In this paper, we introduce a novel UIE framework, named PA-Diff, designed to exploiting the knowledge of physics to guide the diffusion process. PA-Diff consists of Physics Prior Generation (PPG) Branch, Implicit Neural Reconstruction (INR) Branch, and Physics-aware Diffusion Transformer (PDT) Branch. Our designed PPG branch aims to produce the prior knowledge of physics. With utilizing the physics prior knowledge to guide the diffusion process, PDT branch can obtain underwater-aware ability and model the complex distribution in real-world underwater scenes. INR Branch can learn robust feature representations from diverse underwater image via implicit neural representation, which reduces the difficulty of restoration for PDT branch. Extensive experiments prove that our method achieves best performance on UIE tasks.

Learning A Physical-aware Diffusion Model Based on Transformer for Underwater Image Enhancement

TL;DR

This work tackles underwater image enhancement (UIE) by integrating physical underwater imaging mechanisms into diffusion-based restoration. The authors propose PA-Diff, a three-branch framework comprising Physics Prior Generation (PPG) for transmission and background-light priors, Implicit Neural Reconstruction (INR) for robust feature representations, and a Physics-aware Diffusion Transformer (PDT) that fuses priors with diffusion via physics-guided self-attention, cross-attention, and a physics perception unit. Key contributions include the first physics-guided diffusion model for UIE, the plug-and-play Physics Perception Unit (PPU), and extensive ablations showing each component’s effectiveness across real-world underwater datasets, achieving state-of-the-art results. The approach enhances not only quantitative metrics but also perceptual quality, improving robustness across varied underwater scenes and supporting downstream vision tasks in underwater environments.

Abstract

Underwater visuals undergo various complex degradations, inevitably influencing the efficiency of underwater vision tasks. Recently, diffusion models were employed to underwater image enhancement (UIE) tasks, and gained SOTA performance. However, these methods fail to consider the physical properties and underwater imaging mechanisms in the diffusion process, limiting information completion capacity of diffusion models. In this paper, we introduce a novel UIE framework, named PA-Diff, designed to exploiting the knowledge of physics to guide the diffusion process. PA-Diff consists of Physics Prior Generation (PPG) Branch, Implicit Neural Reconstruction (INR) Branch, and Physics-aware Diffusion Transformer (PDT) Branch. Our designed PPG branch aims to produce the prior knowledge of physics. With utilizing the physics prior knowledge to guide the diffusion process, PDT branch can obtain underwater-aware ability and model the complex distribution in real-world underwater scenes. INR Branch can learn robust feature representations from diverse underwater image via implicit neural representation, which reduces the difficulty of restoration for PDT branch. Extensive experiments prove that our method achieves best performance on UIE tasks.
Paper Structure (15 sections, 36 equations, 7 figures, 3 tables)

This paper contains 15 sections, 36 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overall framework of our proposed PA-Diff. PA-Diff mainly consists of three cooperative branches: Physics Prior Generation (PPG) Branch, Implicit Neural Reconstruction (INR) Branch, and Physics-aware Diffusion Transformer (PDT) Branch. Our designed PPG branch aims to produce the prior knowledge of physics. With utilizing the physics prior knowledge to guide the diffusion process, PDT branch can obtain underwater-aware ability and model the complex distribution in real-world underwater scenes. INR Branch can learn robust feature representations from diverse underwater image, which reduces the difficulty of restoration for diffusion models.
  • Figure 2: The detailed architecture of our designed (a) physics-aware self-attention (PA-SA) and (b) gated multi-scale feed-forward network (GM-FFN).
  • Figure 3: The detailed structure of the proposed physics perception unit (PPU).
  • Figure 4: The visual results of the background scattered light $B^c$, the medium transmission map $T^c(x)$, and the implicit neural output $I_{inr}$.
  • Figure 5: Qualitative comparison with other SOTA methods on the UIEBD and LSUI datasets.
  • ...and 2 more figures