3D Scene Geometry Estimation from 360$^\circ$ Imagery: A Survey
Thiago Lopes Trugillo da Silveira, Paulo Gamarra Lessa Pinto, Jeffri Erwin Murrugarra Llerena, Claudio Rosito Jung
TL;DR
The paper surveys 3D scene geometry estimation from omnidirectional 360° imagery, detailing foundations of the spherical camera model and common representations, and then reviews monocular, stereo, and multi-view approaches on panoramas. It highlights the dominance of learning-based methods for single-view depth and layout estimation, surveys a wide range of representation schemes (ERP, CMP, tangent planes, icospheres, SpherePHD), and discusses datasets and evaluation metrics. Key contributions include a structured taxonomy of panorama-based 3D reconstruction methods, a consolidated view of public datasets, and a critical assessment of state-of-the-art results across indoor and outdoor scenarios. The survey underscores the need for standardized benchmarks and scalable spherical learning approaches to enable robust, real-time 6-DoF immersive navigation in AR/VR/MR and autonomous systems.
Abstract
This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies based on single, two, or multiple images captured under the omnidirectional optics. We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats suitable for omnidirectional (also called 360$^\circ$, spherical or panoramic) images and videos. We then survey monocular layout and depth inference approaches, highlighting the recent advances in learning-based solutions suited for spherical data. The classical stereo matching is then revised on the spherical domain, where methodologies for detecting and describing sparse and dense features become crucial. The stereo matching concepts are then extrapolated for multiple view camera setups, categorizing them among light fields, multi-view stereo, and structure from motion (or visual simultaneous localization and mapping). We also compile and discuss commonly adopted datasets and figures of merit indicated for each purpose and list recent results for completeness. We conclude this paper by pointing out current and future trends.
