Table of Contents
Fetching ...

Multi-level distortion-aware deformable network for omnidirectional image super-resolution

Cuixin Yang, Rongkang Dong, Kin-Man Lam, Yuhang Zhang, Guoping Qiu

TL;DR

The paper addresses omnidirectional image super-resolution under ERP-induced distortion by introducing the Multi-level Distortion-aware Deformable Network (MDDN). MDDN combines a distortion-aware deformable cross-attention mechanism with dilated deformable convolutions across multiple sampling levels, and adaptively fuses these features via a multi-level feature fusion module, aided by low-rank decompositions to reduce complexity. Extensive experiments on ODI-SR and Flickr360 datasets demonstrate consistent improvements over both 2D planar SR and existing ODISR methods, with better preservation of edges and textures in highly distorted regions. The work advances ODISR by enabling broader receptive fields while maintaining efficiency, though it notes potential enhancements from integrating semantic guidance in future work.

Abstract

As augmented reality and virtual reality applications gain popularity, image processing for OmniDirectional Images (ODIs) has attracted increasing attention. OmniDirectional Image Super-Resolution (ODISR) is a promising technique for enhancing the visual quality of ODIs. Before performing super-resolution, ODIs are typically projected from a spherical surface onto a plane using EquiRectangular Projection (ERP). This projection introduces latitude-dependent geometric distortion in ERP images: distortion is minimal near the equator but becomes severe toward the poles, where image content is stretched across a wider area. However, existing ODISR methods have limited sampling ranges and feature extraction capabilities, which hinder their ability to capture distorted patterns over large areas. To address this issue, we propose a novel Multi-level Distortion-aware Deformable Network (MDDN) for ODISR, designed to expand the sampling range and receptive field. Specifically, the feature extractor in MDDN comprises three parallel branches: a deformable attention mechanism (serving as the dilation=1 path) and two dilated deformable convolutions with dilation rates of 2 and 3. This architecture expands the sampling range to include more distorted patterns across wider areas, generating dense and comprehensive features that effectively capture geometric distortions in ERP images. The representations extracted from these deformable feature extractors are adaptively fused in a multi-level feature fusion module. Furthermore, to reduce computational cost, a low-rank decomposition strategy is applied to dilated deformable convolutions. Extensive experiments on publicly available datasets demonstrate that MDDN outperforms state-of-the-art methods, underscoring its effectiveness and superiority in ODISR.

Multi-level distortion-aware deformable network for omnidirectional image super-resolution

TL;DR

The paper addresses omnidirectional image super-resolution under ERP-induced distortion by introducing the Multi-level Distortion-aware Deformable Network (MDDN). MDDN combines a distortion-aware deformable cross-attention mechanism with dilated deformable convolutions across multiple sampling levels, and adaptively fuses these features via a multi-level feature fusion module, aided by low-rank decompositions to reduce complexity. Extensive experiments on ODI-SR and Flickr360 datasets demonstrate consistent improvements over both 2D planar SR and existing ODISR methods, with better preservation of edges and textures in highly distorted regions. The work advances ODISR by enabling broader receptive fields while maintaining efficiency, though it notes potential enhancements from integrating semantic guidance in future work.

Abstract

As augmented reality and virtual reality applications gain popularity, image processing for OmniDirectional Images (ODIs) has attracted increasing attention. OmniDirectional Image Super-Resolution (ODISR) is a promising technique for enhancing the visual quality of ODIs. Before performing super-resolution, ODIs are typically projected from a spherical surface onto a plane using EquiRectangular Projection (ERP). This projection introduces latitude-dependent geometric distortion in ERP images: distortion is minimal near the equator but becomes severe toward the poles, where image content is stretched across a wider area. However, existing ODISR methods have limited sampling ranges and feature extraction capabilities, which hinder their ability to capture distorted patterns over large areas. To address this issue, we propose a novel Multi-level Distortion-aware Deformable Network (MDDN) for ODISR, designed to expand the sampling range and receptive field. Specifically, the feature extractor in MDDN comprises three parallel branches: a deformable attention mechanism (serving as the dilation=1 path) and two dilated deformable convolutions with dilation rates of 2 and 3. This architecture expands the sampling range to include more distorted patterns across wider areas, generating dense and comprehensive features that effectively capture geometric distortions in ERP images. The representations extracted from these deformable feature extractors are adaptively fused in a multi-level feature fusion module. Furthermore, to reduce computational cost, a low-rank decomposition strategy is applied to dilated deformable convolutions. Extensive experiments on publicly available datasets demonstrate that MDDN outperforms state-of-the-art methods, underscoring its effectiveness and superiority in ODISR.

Paper Structure

This paper contains 29 sections, 11 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Sampling of the previous single-level (left) and the proposed multi-level (right) distortion-aware deformable feature extractors. The centers of the regular grids represent sampling locations of a fixed, regular-shaped kernel. Due to geometric distortion in ERP, image content, especially at high latitudes, is dramatically stretched. The sampling range of the single-level deformable feature extractor used in previous methods is limited, making it difficult to capture diverse patterns. In contrast, the proposed multi-level deformable feature extractor provides a larger sampling range, enabling better adaptation to distortion and more effective extraction of highly distorted regions.
  • Figure 2: (a) Relationship between the sphere and the projection plane. (b) Visualization of equirectangular projection. The omnidirectional images are mapped onto a 2D plane through equirectangular projection. (c) Distortion map. Darker areas indicate greater distortion, whereas lighter areas show less distortion. The equirectangular projection causes geometric distortion in the equirectangular projection images. The distortion intensifies with rising latitude. With the Equator as the symmetry axis, the geometric distortion in the Northern and Southern Hemispheres is symmetric.
  • Figure 3: Overview of the architecture of the proposed network. $D$ denotes the distortion map. $d$ represents dilation.
  • Figure 4: Structure of Multi-level Distortion-aware Deformable Extractor (MDDE). It consists of a distortion-aware deformable cross-attention (DDCA) mechanism and two decomposed dilated distortion-aware deformable convolution (D4C) layers with $d=2$ and $d=3$, and their offset networks. The numbers of input and output channels are shown below the layers.
  • Figure 5: Multi-level Feature Fusion.
  • ...and 3 more figures