Table of Contents
Fetching ...

Burst Super-Resolution with Diffusion Models for Improving Perceptual Quality

Kyotaro Tokoro, Kazutoshi Akita, Norimichi Ukita

TL;DR

This paper tackles blurry results in burst SR by introducing Burst SR with Diffusion Model (BSRD), which conditions a diffusion-based reverse process on burst LR features and starts reconstruction from an intermediate step to emphasize texture details. By borrowing Burstormer-style feature extraction/alignment and applying Spatial Feature Transformation conditioning within the diffusion U-Net, BSRD achieves sharper boundaries and textures while reducing computational cost. Experiments on SyntheticBurst and BurstSR datasets show perceptual-quality improvements (lower LPIPS and FID) at the expense of some distortions in PSNR/SSIM, demonstrating a favorable trade-off for perceptual fidelity. The work advances burst SR by integrating probabilistic modeling with multi-frame cues, offering practical gains for perceptual quality in real-world imaging pipelines and suggesting avenues for latent-diffusion and efficiency-focused refinements.

Abstract

While burst LR images are useful for improving the SR image quality compared with a single LR image, prior SR networks accepting the burst LR images are trained in a deterministic manner, which is known to produce a blurry SR image. In addition, it is difficult to perfectly align the burst LR images, making the SR image more blurry. Since such blurry images are perceptually degraded, we aim to reconstruct the sharp high-fidelity boundaries. Such high-fidelity images can be reconstructed by diffusion models. However, prior SR methods using the diffusion model are not properly optimized for the burst SR task. Specifically, the reverse process starting from a random sample is not optimized for image enhancement and restoration methods, including burst SR. In our proposed method, on the other hand, burst LR features are used to reconstruct the initial burst SR image that is fed into an intermediate step in the diffusion model. This reverse process from the intermediate step 1) skips diffusion steps for reconstructing the global structure of the image and 2) focuses on steps for refining detailed textures. Our experimental results demonstrate that our method can improve the scores of the perceptual quality metrics. Code: https://github.com/placerkyo/BSRD

Burst Super-Resolution with Diffusion Models for Improving Perceptual Quality

TL;DR

This paper tackles blurry results in burst SR by introducing Burst SR with Diffusion Model (BSRD), which conditions a diffusion-based reverse process on burst LR features and starts reconstruction from an intermediate step to emphasize texture details. By borrowing Burstormer-style feature extraction/alignment and applying Spatial Feature Transformation conditioning within the diffusion U-Net, BSRD achieves sharper boundaries and textures while reducing computational cost. Experiments on SyntheticBurst and BurstSR datasets show perceptual-quality improvements (lower LPIPS and FID) at the expense of some distortions in PSNR/SSIM, demonstrating a favorable trade-off for perceptual fidelity. The work advances burst SR by integrating probabilistic modeling with multi-frame cues, offering practical gains for perceptual quality in real-world imaging pipelines and suggesting avenues for latent-diffusion and efficiency-focused refinements.

Abstract

While burst LR images are useful for improving the SR image quality compared with a single LR image, prior SR networks accepting the burst LR images are trained in a deterministic manner, which is known to produce a blurry SR image. In addition, it is difficult to perfectly align the burst LR images, making the SR image more blurry. Since such blurry images are perceptually degraded, we aim to reconstruct the sharp high-fidelity boundaries. Such high-fidelity images can be reconstructed by diffusion models. However, prior SR methods using the diffusion model are not properly optimized for the burst SR task. Specifically, the reverse process starting from a random sample is not optimized for image enhancement and restoration methods, including burst SR. In our proposed method, on the other hand, burst LR features are used to reconstruct the initial burst SR image that is fed into an intermediate step in the diffusion model. This reverse process from the intermediate step 1) skips diffusion steps for reconstructing the global structure of the image and 2) focuses on steps for refining detailed textures. Our experimental results demonstrate that our method can improve the scores of the perceptual quality metrics. Code: https://github.com/placerkyo/BSRD
Paper Structure (24 sections, 3 equations, 8 figures, 3 tables)

This paper contains 24 sections, 3 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Comparison between prior models and our model (i.e., BSRD).
  • Figure 2: Feature extraction and alignment in Burstormer DBLP:conf/cvpr/DudhaneZ0K023, which are colored by red and yellow, respectively. Different colors in the feature extraction process mean spatial displacements.
  • Figure 3: Overview of the feature extraction, alignment, fusion, and reconstruction processes in BSRD. The feature extraction an alignment modules are borrowed from those of Burstormer, which are shown in Fig. \ref{['fig:Burstormer_1']}. In our proposed fusion module, SFT DBLP:conf/cvpr/WangYDL18 is included in U-Net and used for conditioning with the LR features. The reconstruction process is achieved by the reverse process of the diffusion model.
  • Figure 4: Reverse process from the intermediate step. Instead of the reverse process starting from $T$-th step, the initial burst SR image is appropriately noised and fed into the diffusion model from $t$-th step.
  • Figure 5: Comparison between the linear and sigmoid schedulers.
  • ...and 3 more figures