Table of Contents
Fetching ...

Robust and Flexible Omnidirectional Depth Estimation with Multiple 360-degree Cameras

Ming Li, Xuejiao Hu, Xueqian Jin, Jinghao Cao, Sidan Du, Yang Li

TL;DR

This work tackles robust omnidirectional depth estimation across diverse 360° camera rigs under lens soiling and layout variations. It introduces Generalized Epipolar Equirectangular (GEER) projection and two geometry-constrained pipelines: a two-stage Pairwise Stereo MODE ($PSMODE$) and a one-stage Spherical Sweeping MODE ($SSMODE$), supported by a spherical feature extraction module. A new synthetic outdoor dataset, Deep360, with soiled panorama variants is presented to train and evaluate 360° depth estimation under realistic conditions. Empirical results show state-of-the-art depth predictions with strong robustness to soiling, along with demonstrated flexibility to different camera configurations and numbers, highlighting practical impact for autonomous driving and robotics.

Abstract

Omnidirectional depth estimation has received much attention from researchers in recent years. However, challenges arise due to camera soiling and variations in camera layouts, affecting the robustness and flexibility of the algorithm. In this paper, we use the geometric constraints and redundant information of multiple 360-degree cameras to achieve robust and flexible multi-view omnidirectional depth estimation. We implement two algorithms, in which the two-stage algorithm obtains initial depth maps by pairwise stereo matching of multiple cameras and fuses the multiple depth maps to achieve the final depth estimation; the one-stage algorithm adopts spherical sweeping based on hypothetical depths to construct a uniform spherical matching cost of the multi-camera images and obtain the depth. Additionally, a generalized epipolar equirectangular projection is introduced to simplify the spherical epipolar constraints. To overcome panorama distortion, a spherical feature extractor is implemented. Furthermore, a synthetic 360-degree dataset consisting of 12K road scene panoramas and 3K ground truth depth maps is presented to train and evaluate 360-degree depth estimation algorithms. Our dataset takes soiled camera lenses and glare into consideration, which is more consistent with the real-world environment. Experiments show that our two algorithms achieve state-of-the-art performance, accurately predicting depth maps even when provided with soiled panorama inputs. The flexibility of the algorithms is experimentally validated in terms of camera layouts and numbers.

Robust and Flexible Omnidirectional Depth Estimation with Multiple 360-degree Cameras

TL;DR

This work tackles robust omnidirectional depth estimation across diverse 360° camera rigs under lens soiling and layout variations. It introduces Generalized Epipolar Equirectangular (GEER) projection and two geometry-constrained pipelines: a two-stage Pairwise Stereo MODE () and a one-stage Spherical Sweeping MODE (), supported by a spherical feature extraction module. A new synthetic outdoor dataset, Deep360, with soiled panorama variants is presented to train and evaluate 360° depth estimation under realistic conditions. Empirical results show state-of-the-art depth predictions with strong robustness to soiling, along with demonstrated flexibility to different camera configurations and numbers, highlighting practical impact for autonomous driving and robotics.

Abstract

Omnidirectional depth estimation has received much attention from researchers in recent years. However, challenges arise due to camera soiling and variations in camera layouts, affecting the robustness and flexibility of the algorithm. In this paper, we use the geometric constraints and redundant information of multiple 360-degree cameras to achieve robust and flexible multi-view omnidirectional depth estimation. We implement two algorithms, in which the two-stage algorithm obtains initial depth maps by pairwise stereo matching of multiple cameras and fuses the multiple depth maps to achieve the final depth estimation; the one-stage algorithm adopts spherical sweeping based on hypothetical depths to construct a uniform spherical matching cost of the multi-camera images and obtain the depth. Additionally, a generalized epipolar equirectangular projection is introduced to simplify the spherical epipolar constraints. To overcome panorama distortion, a spherical feature extractor is implemented. Furthermore, a synthetic 360-degree dataset consisting of 12K road scene panoramas and 3K ground truth depth maps is presented to train and evaluate 360-degree depth estimation algorithms. Our dataset takes soiled camera lenses and glare into consideration, which is more consistent with the real-world environment. Experiments show that our two algorithms achieve state-of-the-art performance, accurately predicting depth maps even when provided with soiled panorama inputs. The flexibility of the algorithms is experimentally validated in terms of camera layouts and numbers.
Paper Structure (29 sections, 8 equations, 13 figures, 8 tables)

This paper contains 29 sections, 8 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Overview of the proposed robust and flexible multi-view omnidirectional depth estimation framework. (a) and (b) show the multiple 360$^\circ$ camera rig. (c) and (d) show the results of predicted depth map and reconstructed point cloud on synthetic and real-world data. (e) illstrates the different type of camera soiling in practice. For each sample in (e), the upper and the lower show the soiled panoramas in real-world and synthetic dataset, respectively
  • Figure 2: (a) The coordinate definition and geometry of the proposed generalized epipolar equirectangular projection. (b) The samples of omnidirectional stereo pairs at different relative poses on GEER projection. The spherical epipolar constraint is simplified to horizontal lines on GEER projection
  • Figure 3: (a) The process of spherical sweeping. (b) The construction of the spherical cost volume. The points at different hypotheses depth can be projected to the cameras coordinates to obtain the features. Then the features of the same point from different cameras are concatenated to represent the matching cost
  • Figure 4: The structure of proposed spherical feature extraction module. We use four stages of residual blocks to build the module and fuse the features from different stages. The sphere convolution is adopted in the last stage to obtain high-level semantic and context features
  • Figure 5: The architecture of proposed PSMODE, which contains two stage to estimate the omnidirectional depth map. In the first stage, we propose an omnidirectional stereo matching network to obtain depth maps and confidence maps of different stereo pairs. In the second stage, we fuse the multi-view depth maps to estimate the final depth maps
  • ...and 8 more figures