Table of Contents
Fetching ...

A Survey of Representation Learning, Optimization Strategies, and Applications for Omnidirectional Vision

Hao Ai, Zidong Cao, Lin Wang

TL;DR

This survey comprehensively analyzes deep learning methods for omnidirectional vision, covering imaging principles, projection formats, and publicly available datasets, then organizes representation learning and optimization strategies into a hierarchical taxonomy. It surveys Euclidean and non-Euclidean ODI representations, distortion-aware learning, and cross-domain transfer, highlighting how projection choices affect model design. The paper catalogs DL approaches across visual enhancement, scene understanding, and 3D geometry/motion estimation, offering representative methods, datasets, and performance trends, and discusses open challenges and promising directions such as distortion-aware attention, transformer-based spherical models, and data-efficient learning. By connecting imaging, learning, and application aspects, the work aims to accelerate research and practical deployment of omnidirectional vision in AR/VR, robotics, and autonomous driving.

Abstract

Omnidirectional image (ODI) data is captured with a field-of-view of 360x180, which is much wider than the pinhole cameras and captures richer surrounding environment details than the conventional perspective images. In recent years, the availability of customer-level 360 cameras has made omnidirectional vision more popular, and the advance of deep learning (DL) has significantly sparked its research and applications. This paper presents a systematic and comprehensive review and analysis of the recent progress of DL for omnidirectional vision. It delineates the distinct challenges and complexities encountered in applying DL to omnidirectional images as opposed to traditional perspective imagery. Our work covers four main contents: (i) A thorough introduction to the principles of omnidirectional imaging and commonly explored projections of ODI; (ii) A methodical review of varied representation learning approaches tailored for ODI; (iii) An in-depth investigation of optimization strategies specific to omnidirectional vision; (iv) A structural and hierarchical taxonomy of the DL methods for the representative omnidirectional vision tasks, from visual enhancement (e.g., image generation and super-resolution) to 3D geometry and motion estimation (e.g., depth and optical flow estimation), alongside the discussions on emergent research directions; (v) An overview of cutting-edge applications (e.g., autonomous driving and virtual reality), coupled with a critical discussion on prevailing challenges and open questions, to trigger more research in the community.

A Survey of Representation Learning, Optimization Strategies, and Applications for Omnidirectional Vision

TL;DR

This survey comprehensively analyzes deep learning methods for omnidirectional vision, covering imaging principles, projection formats, and publicly available datasets, then organizes representation learning and optimization strategies into a hierarchical taxonomy. It surveys Euclidean and non-Euclidean ODI representations, distortion-aware learning, and cross-domain transfer, highlighting how projection choices affect model design. The paper catalogs DL approaches across visual enhancement, scene understanding, and 3D geometry/motion estimation, offering representative methods, datasets, and performance trends, and discusses open challenges and promising directions such as distortion-aware attention, transformer-based spherical models, and data-efficient learning. By connecting imaging, learning, and application aspects, the work aims to accelerate research and practical deployment of omnidirectional vision in AR/VR, robotics, and autonomous driving.

Abstract

Omnidirectional image (ODI) data is captured with a field-of-view of 360x180, which is much wider than the pinhole cameras and captures richer surrounding environment details than the conventional perspective images. In recent years, the availability of customer-level 360 cameras has made omnidirectional vision more popular, and the advance of deep learning (DL) has significantly sparked its research and applications. This paper presents a systematic and comprehensive review and analysis of the recent progress of DL for omnidirectional vision. It delineates the distinct challenges and complexities encountered in applying DL to omnidirectional images as opposed to traditional perspective imagery. Our work covers four main contents: (i) A thorough introduction to the principles of omnidirectional imaging and commonly explored projections of ODI; (ii) A methodical review of varied representation learning approaches tailored for ODI; (iii) An in-depth investigation of optimization strategies specific to omnidirectional vision; (iv) A structural and hierarchical taxonomy of the DL methods for the representative omnidirectional vision tasks, from visual enhancement (e.g., image generation and super-resolution) to 3D geometry and motion estimation (e.g., depth and optical flow estimation), alongside the discussions on emergent research directions; (v) An overview of cutting-edge applications (e.g., autonomous driving and virtual reality), coupled with a critical discussion on prevailing challenges and open questions, to trigger more research in the community.

Paper Structure

This paper contains 34 sections, 3 equations, 24 figures, 8 tables.

Figures (24)

  • Figure 1: Overview of representation learning, optimization strategies, and applications for omnidirectional vision.
  • Figure 2: Hierarchical and structural taxonomy of omnidirectional vision with deep learning.
  • Figure 3: Examples of $360^\circ$ cameras: (a) RICOH Theta Z1, and (b) GoPro Omni.
  • Figure 4: Imaging principles of several cameras: (a) Pinhole camera; (b) Fisheye camera; (c) 360$^\circ$ camera (dual-fisheye); (d) 360$^\circ$ camera (multi-fisheye).
  • Figure 5: Illustration of the process for stitching a pair of dual-fisheye images into an ERP format ODI.
  • ...and 19 more figures