Table of Contents
Fetching ...

OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

Runyi Li, Xuhan Sheng, Weiqi Li, Jian Zhang

TL;DR

OmniSSR tackles zero-shot omnidirectional image super-resolution by bridging omnidirectional ERP data and planar TP priors through Octadecaplex Tangent Information Interaction (OTII). It iteratively denoises TP representations with a Stable Diffusion backbone and enforces fidelity-realness balance via Gradient Decomposition (GD) corrections, applied both during sampling and post-processing. The method achieves competitive fidelity and superior perceptual realism compared with diffusion-based and supervised baselines on ODI-SR and SUN 360, while requiring no training or fine-tuning on ODI data. This training-free approach reduces data requirements and supports cross-domain generalization, with potential extensions to ODI editing, inpainting, and 3D scene enhancements.

Abstract

Omnidirectional images (ODIs) are commonly used in real-world visual tasks, and high-resolution ODIs help improve the performance of related visual tasks. Most existing super-resolution methods for ODIs use end-to-end learning strategies, resulting in inferior realness of generated images and a lack of effective out-of-domain generalization capabilities in training methods. Image generation methods represented by diffusion model provide strong priors for visual tasks and have been proven to be effectively applied to image restoration tasks. Leveraging the image priors of the Stable Diffusion (SD) model, we achieve omnidirectional image super-resolution with both fidelity and realness, dubbed as OmniSSR. Firstly, we transform the equirectangular projection (ERP) images into tangent projection (TP) images, whose distribution approximates the planar image domain. Then, we use SD to iteratively sample initial high-resolution results. At each denoising iteration, we further correct and update the initial results using the proposed Octadecaplex Tangent Information Interaction (OTII) and Gradient Decomposition (GD) technique to ensure better consistency. Finally, the TP images are transformed back to obtain the final high-resolution results. Our method is zero-shot, requiring no training or fine-tuning. Experiments of our method on two benchmark datasets demonstrate the effectiveness of our proposed method.

OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

TL;DR

OmniSSR tackles zero-shot omnidirectional image super-resolution by bridging omnidirectional ERP data and planar TP priors through Octadecaplex Tangent Information Interaction (OTII). It iteratively denoises TP representations with a Stable Diffusion backbone and enforces fidelity-realness balance via Gradient Decomposition (GD) corrections, applied both during sampling and post-processing. The method achieves competitive fidelity and superior perceptual realism compared with diffusion-based and supervised baselines on ODI-SR and SUN 360, while requiring no training or fine-tuning on ODI data. This training-free approach reduces data requirements and supports cross-domain generalization, with potential extensions to ODI editing, inpainting, and 3D scene enhancements.

Abstract

Omnidirectional images (ODIs) are commonly used in real-world visual tasks, and high-resolution ODIs help improve the performance of related visual tasks. Most existing super-resolution methods for ODIs use end-to-end learning strategies, resulting in inferior realness of generated images and a lack of effective out-of-domain generalization capabilities in training methods. Image generation methods represented by diffusion model provide strong priors for visual tasks and have been proven to be effectively applied to image restoration tasks. Leveraging the image priors of the Stable Diffusion (SD) model, we achieve omnidirectional image super-resolution with both fidelity and realness, dubbed as OmniSSR. Firstly, we transform the equirectangular projection (ERP) images into tangent projection (TP) images, whose distribution approximates the planar image domain. Then, we use SD to iteratively sample initial high-resolution results. At each denoising iteration, we further correct and update the initial results using the proposed Octadecaplex Tangent Information Interaction (OTII) and Gradient Decomposition (GD) technique to ensure better consistency. Finally, the TP images are transformed back to obtain the final high-resolution results. Our method is zero-shot, requiring no training or fine-tuning. Experiments of our method on two benchmark datasets demonstrate the effectiveness of our proposed method.
Paper Structure (31 sections, 9 equations, 11 figures, 7 tables, 2 algorithms)

This paper contains 31 sections, 9 equations, 11 figures, 7 tables, 2 algorithms.

Figures (11)

  • Figure 1: We address omnidirectional image super-resolution in a zero-shot manner via OmniSSR. Presented above are select outcomes that sketch the essence of OmniSSR compared with current state-of-the-art approach OSRT osrt_Yu_Wang_Cao_Li_Shan_Dong_2023. Part (a) and (b) illustrate that OmniSSR upholds fidelity and visual realness at the same time, providing vivid and realistic details, while OSRT outputs over-smoothed and distorted results. Zoom in for more details.
  • Figure 2: Details about gnomonic transformations. (a) conversion from ERP to TP. (b) pre-upsampling proposed in Octadecaplex Tangent Information Interaction (Sec. \ref{['subsec:transform']}) mitigating loss during transformation.
  • Figure 3: Overview of our proposed OmniSSR. Input low-resolution omnidirectional image $\mathbf{E}_{init}$ in ERP format is first projected onto Tangent Projection (TP) images $\mathbf{x}_{init}^{(1)},\mathbf{x}_{init}^{(2)},...,\mathbf{x}_{init}^{(m)}$, then iteratively refined via Stable Diffusion (SD) with a time-aware adapter and controllable feature wrapping (CFW) module. In each step of diffusion sampling, we adopt the Gradient Decomposition (GD) correction technique to introduce consistency constraints for the restored intermediate results. After $T$ steps of sampling, we obtain the final result $\mathbf{\tilde{\mathbf{E}}}_{0}$ with high resolution and better visual quality.
  • Figure 4: Visualized comparison of $\times$2 and $\times$4 SR results on SUN 360 testset. 001 and 009 is the id number in testset filenames. We also calculate the PSNR and SSIM to HR ground truth of each SR result and downsampled image.
  • Figure 5: Visualized comparison of $\times$2 and $\times$4 SR results on ODI-SR test set. 067 and 049 are the id numbers in test set filenames. We also calculate the PSNR and SSIM between ground truth and each SR result as well as downsampled image.
  • ...and 6 more figures