Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution
Cuixin Yang, Rongkang Dong, Jun Xiao, Cong Zhang, Kin-Man Lam, Fei Zhou, Guoping Qiu
TL;DR
This paper tackles omnidirectional image super-resolution by explicitly modeling equirectangular projection distortion. It introduces GDGT-OSR, a distortion-guided transformer framework that uses Distortion Guidance Generator, Distortion-Modulated Rectangle-window Self-Attention, Distortion-aware Deformable Self-Attention, and Dynamic Feature Aggregation to capture self-similar textures across latitudes. The architecture relies on a formal distortion model, including the stretching ratio $R_{ERP}$ and distortion map $D$, and trains with a distortion-weighted WS-$l1$ loss to reflect non-uniform pixel densities. Empirical results on public ODI datasets show state-of-the-art performance for 2×, 4×, and robust large-scale super-resolution, with ablations confirming the contributions of each module. The work advances practical omnidirectional SR for VR/AR by enabling broader attention ranges and distortion-aware texture reconstruction across latitudes.
Abstract
As virtual and augmented reality applications gain popularity, omnidirectional image (ODI) super-resolution has become increasingly important. Unlike 2D plain images that are formed on a plane, ODIs are projected onto spherical surfaces. Applying established image super-resolution methods to ODIs, therefore, requires performing equirectangular projection (ERP) to map the ODIs onto a plane. ODI super-resolution needs to take into account geometric distortion resulting from ERP. However, without considering such geometric distortion of ERP images, previous deep-learning-based methods only utilize a limited range of pixels and may easily miss self-similar textures for reconstruction. In this paper, we introduce a novel Geometric Distortion Guided Transformer for Omnidirectional image Super-Resolution (GDGT-OSR). Specifically, a distortion modulated rectangle-window self-attention mechanism, integrated with deformable self-attention, is proposed to better perceive the distortion and thus involve more self-similar textures. Distortion modulation is achieved through a newly devised distortion guidance generator that produces guidance by exploiting the variability of distortion across latitudes. Furthermore, we propose a dynamic feature aggregation scheme to adaptively fuse the features from different self-attention modules. We present extensive experimental results on public datasets and show that the new GDGT-OSR outperforms methods in existing literature.
