Table of Contents
Fetching ...

RealOSR: Latent Unfolding Boosting Diffusion-based Real-world Omnidirectional Image Super-Resolution

Xuhan Sheng, Runyi Li, Bin Chen, Weiqi Li, Xu Jiang, Jian Zhang

TL;DR

This work tackles Real-ODISR by introducing RealOSR, a diffusion-based framework that performs single-step denoising to drastically improve inference speed while accommodating unknown real-world degradations in omnidirectional imagery. It integrates a degradation-aware latent unfolding pipeline with a lightweight Domain Alignment Module and a Latent Unfolding Module, enabling efficient LR-ERP guidance directly in latent space through degradation-conditioned dynamic convolutions. The method bridges ERP and TP projections to leverage planar priors, and uses degradation-aware LoRA to modulate the SD UNet during a single denoising step, achieving superior fidelity and realism against diffusion-based and end-to-end SR baselines, with substantial acceleration (over 200x relative to some diffusion methods). These results, combined with extensive ablations and robustness analyses, establish RealOSR as a strong baseline for Real-ODISR and point to practical potential for high-resolution ODI applications in VR, streaming, and surveillance, while highlighting the need for ODI-specific evaluation metrics and lighter deployments for edge devices.

Abstract

Omnidirectional image super-resolution (ODISR) aims to upscale low-resolution (LR) omnidirectional images (ODIs) to high-resolution (HR), addressing the growing demand for detailed visual content across a $180^{\circ}\times360^{\circ}$ viewport. Existing methods are limited by simple degradation assumptions (e.g., bicubic downsampling), which fail to capture the complex, unknown real-world degradation processes. Recent diffusion-based approaches suffer from slow inference due to their hundreds of sampling steps and frequent pixel-latent space conversions. To tackle these challenges, in this paper, we propose RealOSR, a novel diffusion-based approach for real-world ODISR (Real-ODISR) with single-step diffusion denoising. To sufficiently exploit the input information, RealOSR introduces a lightweight domain alignment module, which facilitates the efficient injection of LR ODI into the single-step latent denoising. Additionally, to better utilize the rich semantic and multi-scale feature modeling ability of denoising UNet, we develop a latent unfolding module that simulates the gradient descent process directly in latent space. Experimental results demonstrate that RealOSR outperforms previous methods in both ODI recovery quality and efficiency. Compared to the recent state-of-the-art diffusion-based ODISR method, OmniSSR, RealOSR achieves significant improvements in visual quality and over \textbf{200$\times$} inference acceleration. Our code and models will be released.

RealOSR: Latent Unfolding Boosting Diffusion-based Real-world Omnidirectional Image Super-Resolution

TL;DR

This work tackles Real-ODISR by introducing RealOSR, a diffusion-based framework that performs single-step denoising to drastically improve inference speed while accommodating unknown real-world degradations in omnidirectional imagery. It integrates a degradation-aware latent unfolding pipeline with a lightweight Domain Alignment Module and a Latent Unfolding Module, enabling efficient LR-ERP guidance directly in latent space through degradation-conditioned dynamic convolutions. The method bridges ERP and TP projections to leverage planar priors, and uses degradation-aware LoRA to modulate the SD UNet during a single denoising step, achieving superior fidelity and realism against diffusion-based and end-to-end SR baselines, with substantial acceleration (over 200x relative to some diffusion methods). These results, combined with extensive ablations and robustness analyses, establish RealOSR as a strong baseline for Real-ODISR and point to practical potential for high-resolution ODI applications in VR, streaming, and surveillance, while highlighting the need for ODI-specific evaluation metrics and lighter deployments for edge devices.

Abstract

Omnidirectional image super-resolution (ODISR) aims to upscale low-resolution (LR) omnidirectional images (ODIs) to high-resolution (HR), addressing the growing demand for detailed visual content across a viewport. Existing methods are limited by simple degradation assumptions (e.g., bicubic downsampling), which fail to capture the complex, unknown real-world degradation processes. Recent diffusion-based approaches suffer from slow inference due to their hundreds of sampling steps and frequent pixel-latent space conversions. To tackle these challenges, in this paper, we propose RealOSR, a novel diffusion-based approach for real-world ODISR (Real-ODISR) with single-step diffusion denoising. To sufficiently exploit the input information, RealOSR introduces a lightweight domain alignment module, which facilitates the efficient injection of LR ODI into the single-step latent denoising. Additionally, to better utilize the rich semantic and multi-scale feature modeling ability of denoising UNet, we develop a latent unfolding module that simulates the gradient descent process directly in latent space. Experimental results demonstrate that RealOSR outperforms previous methods in both ODI recovery quality and efficiency. Compared to the recent state-of-the-art diffusion-based ODISR method, OmniSSR, RealOSR achieves significant improvements in visual quality and over \textbf{200} inference acceleration. Our code and models will be released.

Paper Structure

This paper contains 22 sections, 7 equations, 11 figures, 7 tables, 2 algorithms.

Figures (11)

  • Figure 1: Visualized and quantitative comparison of our method and other approaches, which demonstrates superior performance of both fidelity and visual realness of our method.
  • Figure 2: Bicubic up/down-sampling in latent space, decoded to pixel space for visualization. Despite minor damage, the original information is largely preserved, motivating us to apply deep unfolding directly in latent space.
  • Figure 3: Overall architecture and detailed Deep Unfolding Injector Guidance of our proposed RealOSR. Input LR ERP is first transformed into TP images, and then sent sequentially through the SD encoder into denoising UNet with (1) degradation-aware LoRA, and (2) DUIG for LR information guidance. All generated TP images are transformed back to ERP format to obtain the final SR result.
  • Figure 4: Details of $\mathbf{\Phi}_{\hat{\theta}}(\cdot)$ and $\mathbf{\Phi}^{\top}_{\hat{\theta}}(\cdot)$, both designed as 3$\times$3 degradation-aware dynamic convolutions. Despite having identical network structures, they do not share learned parameters' weights.
  • Figure 5: Visualized comparison of SR results on SUN 360 test set and ODI-SR test set. 0047 and 0095 are the ID numbers in the test set filenames. Our RealOSR can achieve photo-realistic SR results compared to other diffusion-based methods.
  • ...and 6 more figures