Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution

Cuixin Yang; Rongkang Dong; Jun Xiao; Cong Zhang; Kin-Man Lam; Fei Zhou; Guoping Qiu

Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution

Cuixin Yang, Rongkang Dong, Jun Xiao, Cong Zhang, Kin-Man Lam, Fei Zhou, Guoping Qiu

TL;DR

This paper tackles omnidirectional image super-resolution by explicitly modeling equirectangular projection distortion. It introduces GDGT-OSR, a distortion-guided transformer framework that uses Distortion Guidance Generator, Distortion-Modulated Rectangle-window Self-Attention, Distortion-aware Deformable Self-Attention, and Dynamic Feature Aggregation to capture self-similar textures across latitudes. The architecture relies on a formal distortion model, including the stretching ratio $R_{ERP}$ and distortion map $D$, and trains with a distortion-weighted WS-$l1$ loss to reflect non-uniform pixel densities. Empirical results on public ODI datasets show state-of-the-art performance for 2×, 4×, and robust large-scale super-resolution, with ablations confirming the contributions of each module. The work advances practical omnidirectional SR for VR/AR by enabling broader attention ranges and distortion-aware texture reconstruction across latitudes.

Abstract

As virtual and augmented reality applications gain popularity, omnidirectional image (ODI) super-resolution has become increasingly important. Unlike 2D plain images that are formed on a plane, ODIs are projected onto spherical surfaces. Applying established image super-resolution methods to ODIs, therefore, requires performing equirectangular projection (ERP) to map the ODIs onto a plane. ODI super-resolution needs to take into account geometric distortion resulting from ERP. However, without considering such geometric distortion of ERP images, previous deep-learning-based methods only utilize a limited range of pixels and may easily miss self-similar textures for reconstruction. In this paper, we introduce a novel Geometric Distortion Guided Transformer for Omnidirectional image Super-Resolution (GDGT-OSR). Specifically, a distortion modulated rectangle-window self-attention mechanism, integrated with deformable self-attention, is proposed to better perceive the distortion and thus involve more self-similar textures. Distortion modulation is achieved through a newly devised distortion guidance generator that produces guidance by exploiting the variability of distortion across latitudes. Furthermore, we propose a dynamic feature aggregation scheme to adaptively fuse the features from different self-attention modules. We present extensive experimental results on public datasets and show that the new GDGT-OSR outperforms methods in existing literature.

Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution

TL;DR

and distortion map

, and trains with a distortion-weighted WS-

loss to reflect non-uniform pixel densities. Empirical results on public ODI datasets show state-of-the-art performance for 2×, 4×, and robust large-scale super-resolution, with ablations confirming the contributions of each module. The work advances practical omnidirectional SR for VR/AR by enabling broader attention ranges and distortion-aware texture reconstruction across latitudes.

Abstract

Paper Structure (24 sections, 21 equations, 15 figures, 11 tables)

This paper contains 24 sections, 21 equations, 15 figures, 11 tables.

Introduction
Related Works
Single Image Super-Resolution
Omnidirectional Image Super-Resolution
Vision Transformer
Methodology
Preliminaries
Stretching Ratio
Distortion Map
Architecture
Overview
Distortion Guidance Generator (DGG)
Distortion Modulated Rectangle-Window Self-Attention (DMRSA)
Dynamic Feature Aggregation (DFA)
Loss Function
...and 9 more sections

Figures (15)

Figure 1: Comparison of local attribution maps gu2021interpreting and SR results among different methods. The local attribution maps represent the importance of each pixel in reconstructing the patch in the red box. The Diffusion Index (DI) is shown below the local attribution maps. A higher DI value indicates a wider range of the involved pixels. The second row shows the Area of Contribution, which implies the areas involved and their contributions. The local attribution maps, DI values, and Area of Contribution collectively demonstrate that our proposed method engages more pixels in the reconstruction. This contributes to restoring more realistic details, leading to improved SR performance.
Figure 2: Geometric explanation of the relationship between ERP (left) and the sphere, as well as the relationship between the sphere and the tangential cube (right).
Figure 3: Distortion map. A lighter area represents less distortion, while a darker area represents higher distortion.
Figure 4: Overview of the GDGT-OSR architecture (upper part) and the detailed structure of the DAB (bottom part).
Figure 5: Illustration of the Distortion Guidance Generator (DGG).
...and 10 more figures

Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution

TL;DR

Abstract

Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution

Authors

TL;DR

Abstract

Table of Contents

Figures (15)