Burst Super-Resolution with Diffusion Models for Improving Perceptual Quality
Kyotaro Tokoro, Kazutoshi Akita, Norimichi Ukita
TL;DR
This paper tackles blurry results in burst SR by introducing Burst SR with Diffusion Model (BSRD), which conditions a diffusion-based reverse process on burst LR features and starts reconstruction from an intermediate step to emphasize texture details. By borrowing Burstormer-style feature extraction/alignment and applying Spatial Feature Transformation conditioning within the diffusion U-Net, BSRD achieves sharper boundaries and textures while reducing computational cost. Experiments on SyntheticBurst and BurstSR datasets show perceptual-quality improvements (lower LPIPS and FID) at the expense of some distortions in PSNR/SSIM, demonstrating a favorable trade-off for perceptual fidelity. The work advances burst SR by integrating probabilistic modeling with multi-frame cues, offering practical gains for perceptual quality in real-world imaging pipelines and suggesting avenues for latent-diffusion and efficiency-focused refinements.
Abstract
While burst LR images are useful for improving the SR image quality compared with a single LR image, prior SR networks accepting the burst LR images are trained in a deterministic manner, which is known to produce a blurry SR image. In addition, it is difficult to perfectly align the burst LR images, making the SR image more blurry. Since such blurry images are perceptually degraded, we aim to reconstruct the sharp high-fidelity boundaries. Such high-fidelity images can be reconstructed by diffusion models. However, prior SR methods using the diffusion model are not properly optimized for the burst SR task. Specifically, the reverse process starting from a random sample is not optimized for image enhancement and restoration methods, including burst SR. In our proposed method, on the other hand, burst LR features are used to reconstruct the initial burst SR image that is fed into an intermediate step in the diffusion model. This reverse process from the intermediate step 1) skips diffusion steps for reconstructing the global structure of the image and 2) focuses on steps for refining detailed textures. Our experimental results demonstrate that our method can improve the scores of the perceptual quality metrics. Code: https://github.com/placerkyo/BSRD
