Multi-task Geometric Estimation of Depth and Surface Normal from Monocular 360° Images
Kun Huang, Fang-Lue Zhang, Fangfang Zhang, Yu-Kun Lai, Paul L. Rosin, Neil A. Dodgson
TL;DR
This work tackles 360° monocular geometry by jointly estimating depth and surface normals from ERP panoramas. It introduces a distortion-aware multi-task transformer framework with a fusion module that enables soft parameter sharing between depth and normal branches, plus a multi-scale spherical decoder to capture both fine and global geometry. The approach achieves state-of-the-art results across five panoramic benchmarks and demonstrates robust generalization, albeit with some limitations on reflective materials. The model offers practical benefits for indoor scene understanding, robot navigation, and 3D reconstruction by providing coherent, dense geometric cues in challenging 360° environments.
Abstract
Geometric estimation is required for scene understanding and analysis in panoramic 360° images. Current methods usually predict a single feature, such as depth or surface normal. These methods can lack robustness, especially when dealing with intricate textures or complex object surfaces. We introduce a novel multi-task learning (MTL) network that simultaneously estimates depth and surface normals from 360° images. Our first innovation is our MTL architecture, which enhances predictions for both tasks by integrating geometric information from depth and surface normal estimation, enabling a deeper understanding of 3D scene structure. Another innovation is our fusion module, which bridges the two tasks, allowing the network to learn shared representations that improve accuracy and robustness. Experimental results demonstrate that our MTL architecture significantly outperforms state-of-the-art methods in both depth and surface normal estimation, showing superior performance in complex and diverse scenes. Our model's effectiveness and generalizability, particularly in handling intricate surface textures, establish it as a new benchmark in 360° image geometric estimation. The code and model are available at \url{https://github.com/huangkun101230/360MTLGeometricEstimation}.
