Table of Contents
Fetching ...

CRF360D: Monocular 360 Depth Estimation via Spherical Fully-Connected CRFs

Zidong Cao, Lin Wang

TL;DR

The paper addresses 360° monocular depth estimation from equirectangular projections by tackling distortion-induced spherical neighbor insufficiency. It introduces spherical fully-connected CRFs (SF-CRFs) with a Spherical Window Transform (SWT) and Planar-Spherical Interaction (PSI) to robustly model both local planar and global spherical relationships, culminating in the CRF360D decoder. The proposed method achieves state-of-the-art results across Stanford2D3D, Matterport3D, and Structured3D, demonstrating strong gains in standard depth metrics and offering efficient computation. By leveraging rotational invariance of the sphere, SWT efficiently propagates equator-derived relationships to all windows, enabling scalable 360° depth estimation with flexible backbones like EfficientNet-B5.

Abstract

Monocular 360 depth estimation is challenging due to the inherent distortion of the equirectangular projection (ERP). This distortion causes a problem: spherical adjacent points are separated after being projected to the ERP plane, particularly in the polar regions. To tackle this problem, recent methods calculate the spherical neighbors in the tangent domain. However, as the tangent patch and sphere only have one common point, these methods construct neighboring spherical relationships around the common point. In this paper, we propose spherical fully-connected CRFs (SF-CRFs). We begin by evenly partitioning an ERP image with regular windows, where windows at the equator involve broader spherical neighbors than those at the poles. To improve the spherical relationships, our SF-CRFs enjoy two key components. Firstly, to involve sufficient spherical neighbors, we propose a Spherical Window Transform (SWT) module. This module aims to replicate the equator window's spherical relationships to all other windows, leveraging the rotational invariance of the sphere. Remarkably, the transformation process is highly efficient, completing the transformation of all windows in a 512X1024 ERP with 0.038 seconds on CPU. Secondly, we propose a Planar-Spherical Interaction (PSI) module to facilitate the relationships between regular and transformed windows, which not only preserves the local details but also captures global structures. By building a decoder based on the SF-CRFs blocks, we propose CRF360D, a novel 360 depth estimation framework that achieves state-of-the-art performance across diverse datasets. Our CRF360D is compatible with different perspective image-trained backbones (e.g., EfficientNet), serving as the encoder.

CRF360D: Monocular 360 Depth Estimation via Spherical Fully-Connected CRFs

TL;DR

The paper addresses 360° monocular depth estimation from equirectangular projections by tackling distortion-induced spherical neighbor insufficiency. It introduces spherical fully-connected CRFs (SF-CRFs) with a Spherical Window Transform (SWT) and Planar-Spherical Interaction (PSI) to robustly model both local planar and global spherical relationships, culminating in the CRF360D decoder. The proposed method achieves state-of-the-art results across Stanford2D3D, Matterport3D, and Structured3D, demonstrating strong gains in standard depth metrics and offering efficient computation. By leveraging rotational invariance of the sphere, SWT efficiently propagates equator-derived relationships to all windows, enabling scalable 360° depth estimation with flexible backbones like EfficientNet-B5.

Abstract

Monocular 360 depth estimation is challenging due to the inherent distortion of the equirectangular projection (ERP). This distortion causes a problem: spherical adjacent points are separated after being projected to the ERP plane, particularly in the polar regions. To tackle this problem, recent methods calculate the spherical neighbors in the tangent domain. However, as the tangent patch and sphere only have one common point, these methods construct neighboring spherical relationships around the common point. In this paper, we propose spherical fully-connected CRFs (SF-CRFs). We begin by evenly partitioning an ERP image with regular windows, where windows at the equator involve broader spherical neighbors than those at the poles. To improve the spherical relationships, our SF-CRFs enjoy two key components. Firstly, to involve sufficient spherical neighbors, we propose a Spherical Window Transform (SWT) module. This module aims to replicate the equator window's spherical relationships to all other windows, leveraging the rotational invariance of the sphere. Remarkably, the transformation process is highly efficient, completing the transformation of all windows in a 512X1024 ERP with 0.038 seconds on CPU. Secondly, we propose a Planar-Spherical Interaction (PSI) module to facilitate the relationships between regular and transformed windows, which not only preserves the local details but also captures global structures. By building a decoder based on the SF-CRFs blocks, we propose CRF360D, a novel 360 depth estimation framework that achieves state-of-the-art performance across diverse datasets. Our CRF360D is compatible with different perspective image-trained backbones (e.g., EfficientNet), serving as the encoder.
Paper Structure (16 sections, 7 equations, 10 figures, 8 tables)

This paper contains 16 sections, 7 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: First row: ERP image exhibits severe distortions. The regular window at the pole involves insufficient spherical points. First column: With our proposed spherical window transform (SWT), each window is transformed to have sufficient spherical relationships. It is based on the rotational invariance of the sphere. Last column: After transformation, the distortion is significantly reduced. The transformed window has better spherical relationships.
  • Figure 2: Illustration of the SWT module and PSI module of our proposed SF-CRFs.
  • Figure 3: (a) The overall pipeline of the proposed CRF360D. (b) The architecture of the proposed SF-CRFs.
  • Figure 4: Qualitative results on Stanford2D3D (top), Matterport3D (middle), and Structured3D (bottom) datasets.
  • Figure 5: Visualization of the window transformation results in the SWT module. Red dots [1]$\bullet$: Nodes in the template window; Blue crosses [1]$\times$: Nodes in the target window; Dots with other colors [1]$\bullet$[1]$\bullet$[1]$\bullet$[1]$\bullet$: Nodes in the transformed windows.
  • ...and 5 more figures