Table of Contents
Fetching ...

Prompt-SID: Learning Structural Representation Prompt via Latent Diffusion for Single-Image Denoising

Huaqiu Li, Wang Zhang, Xiaowan Hu, Tao Jiang, Zikang Chen, Haoqian Wang

TL;DR

Prompt-SID tackles the challenge of self-supervised single-image denoising by learning a structural representation prompt that preserves high-frequency details during downsampling. It leverages RG-Diff, a latent diffusion process, to generate a structural prompt from degraded inputs and fuses this prompt into restoration via a Structural Attention Module within a Transformer denoiser. A scale replay mechanism aligns downsampled and original-scale restorations during training, improving cross-scale generalization. Across synthetic, real-world, and fluorescence imaging datasets, Prompt-SID delivers state-of-the-art performance among self-supervised methods and competitive results against supervised baselines, with proven improvements in detail preservation and edge fidelity.

Abstract

Many studies have concentrated on constructing supervised models utilizing paired datasets for image denoising, which proves to be expensive and time-consuming. Current self-supervised and unsupervised approaches typically rely on blind-spot networks or sub-image pairs sampling, resulting in pixel information loss and destruction of detailed structural information, thereby significantly constraining the efficacy of such methods. In this paper, we introduce Prompt-SID, a prompt-learning-based single image denoising framework that emphasizes preserving of structural details. This approach is trained in a self-supervised manner using downsampled image pairs. It captures original-scale image information through structural encoding and integrates this prompt into the denoiser. To achieve this, we propose a structural representation generation model based on the latent diffusion process and design a structural attention module within the transformer-based denoiser architecture to decode the prompt. Additionally, we introduce a scale replay training mechanism, which effectively mitigates the scale gap from images of different resolutions. We conduct comprehensive experiments on synthetic, real-world, and fluorescence imaging datasets, showcasing the remarkable effectiveness of Prompt-SID. Our code will be released at https://github.com/huaqlili/Prompt-SID.

Prompt-SID: Learning Structural Representation Prompt via Latent Diffusion for Single-Image Denoising

TL;DR

Prompt-SID tackles the challenge of self-supervised single-image denoising by learning a structural representation prompt that preserves high-frequency details during downsampling. It leverages RG-Diff, a latent diffusion process, to generate a structural prompt from degraded inputs and fuses this prompt into restoration via a Structural Attention Module within a Transformer denoiser. A scale replay mechanism aligns downsampled and original-scale restorations during training, improving cross-scale generalization. Across synthetic, real-world, and fluorescence imaging datasets, Prompt-SID delivers state-of-the-art performance among self-supervised methods and competitive results against supervised baselines, with proven improvements in detail preservation and edge fidelity.

Abstract

Many studies have concentrated on constructing supervised models utilizing paired datasets for image denoising, which proves to be expensive and time-consuming. Current self-supervised and unsupervised approaches typically rely on blind-spot networks or sub-image pairs sampling, resulting in pixel information loss and destruction of detailed structural information, thereby significantly constraining the efficacy of such methods. In this paper, we introduce Prompt-SID, a prompt-learning-based single image denoising framework that emphasizes preserving of structural details. This approach is trained in a self-supervised manner using downsampled image pairs. It captures original-scale image information through structural encoding and integrates this prompt into the denoiser. To achieve this, we propose a structural representation generation model based on the latent diffusion process and design a structural attention module within the transformer-based denoiser architecture to decode the prompt. Additionally, we introduce a scale replay training mechanism, which effectively mitigates the scale gap from images of different resolutions. We conduct comprehensive experiments on synthetic, real-world, and fluorescence imaging datasets, showcasing the remarkable effectiveness of Prompt-SID. Our code will be released at https://github.com/huaqlili/Prompt-SID.

Paper Structure

This paper contains 15 sections, 11 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Comparison of Prompt-SID with other self-supervised image denoising methods in terms of model parameters and experimental results of setting $\sigma \in$ [5,50].
  • Figure 2: The distinctions between the pipelines of N2N, NBR2NBR, and Prompt-SID.
  • Figure 3: The primary denoising pipeline of Prompt-SID. (a) This method acquires the sub-images for network training through a spatial redundancy sampling strategy. These inputs are denoised using SPIformer, while the original image's structural representation is obtained as a prompt through RG-Diff. Each Transformer block incorporates a SAM to facilitate feature fusion. (b) During inference, Prompt-SID exclusively employs the original scale image through SPIformer and the RG-Diff branch.
  • Figure 4: Diagram of the RG-Diff branch. Initially, PSE encodes the image representation into an implicit space, followed by a diffusion process within this space to obtain $\mathbf{c}_{org(t)}$. Utilizing the representation of $m_1(\mathbf{x})$ as a conditioning factor, RG-Diff guides the restoration of the representation of $\mathbf{x}$. This is achieved by merging $\mathbf{c}_{org(t)}$, $\mathbf{c}_{sub}$ and timestep $\mathbf{t}$ in the reverse diffusion stage inputting them into the denoising network.
  • Figure 5: Introducing a scale-replay training branch without gradient backpropagation. We pass the original-scale noisy image $\mathbf{x}$ through Prompt-SID and downsample the denoised result to obtain $m_1(f_\theta(\mathbf{x}))$, $m_2(f_\theta(\mathbf{x}))$, $m_3(f_\theta(\mathbf{x}))$. These downscaled outputs are utilized to enforce regularization constraints on the image restoration loss.
  • ...and 5 more figures