Table of Contents
Fetching ...

OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees

Hakyeong Kim, Andreas Meuleman, Hyeonjoong Jang, James Tompkin, Min H. Kim

TL;DR

OmniSDF delivers a neural SDF reconstruction framework tailored for short-baseline omnidirectional video by employing an adaptive spherical binoctree (sphoxels) to concentrate sampling near surfaces and manage memory. It starts from a depth-guided binoctree initialization and iteratively refines sampling through coarse-to-fine subdivisions, using a BFS-based traversal to handle irregular sphoxel shapes. The approach achieves detailed large-scale scene geometry with significantly fewer voxels than Cartesian grids and shows competitive or superior accuracy compared with traditional and neural baselines on synthetic and real omnidirectional data. This yields practical benefits for real-time-like scene understanding in indoor/outdoor settings, with publicly available code supporting research replication.

Abstract

We present a method to reconstruct indoor and outdoor static scene geometry and appearance from an omnidirectional video moving in a small circular sweep. This setting is challenging because of the small baseline and large depth ranges, making it difficult to find ray crossings. To better constrain the optimization, we estimate geometry as a signed distance field within a spherical binoctree data structure and use a complementary efficient tree traversal strategy based on a breadth-first search for sampling. Unlike regular grids or trees, the shape of this structure well-matches the camera setting, creating a better memory-quality trade-off. From an initial depth estimate, the binoctree is adaptively subdivided throughout the optimization; previous methods use a fixed depth that leaves the scene undersampled. In comparison with three neural optimization methods and two non-neural methods, ours shows decreased geometry error on average, especially in a detailed scene, while significantly reducing the required number of voxels to represent such details.

OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees

TL;DR

OmniSDF delivers a neural SDF reconstruction framework tailored for short-baseline omnidirectional video by employing an adaptive spherical binoctree (sphoxels) to concentrate sampling near surfaces and manage memory. It starts from a depth-guided binoctree initialization and iteratively refines sampling through coarse-to-fine subdivisions, using a BFS-based traversal to handle irregular sphoxel shapes. The approach achieves detailed large-scale scene geometry with significantly fewer voxels than Cartesian grids and shows competitive or superior accuracy compared with traditional and neural baselines on synthetic and real omnidirectional data. This yields practical benefits for real-time-like scene understanding in indoor/outdoor settings, with publicly available code supporting research replication.

Abstract

We present a method to reconstruct indoor and outdoor static scene geometry and appearance from an omnidirectional video moving in a small circular sweep. This setting is challenging because of the small baseline and large depth ranges, making it difficult to find ray crossings. To better constrain the optimization, we estimate geometry as a signed distance field within a spherical binoctree data structure and use a complementary efficient tree traversal strategy based on a breadth-first search for sampling. Unlike regular grids or trees, the shape of this structure well-matches the camera setting, creating a better memory-quality trade-off. From an initial depth estimate, the binoctree is adaptively subdivided throughout the optimization; previous methods use a fixed depth that leaves the scene undersampled. In comparison with three neural optimization methods and two non-neural methods, ours shows decreased geometry error on average, especially in a detailed scene, while significantly reducing the required number of voxels to represent such details.
Paper Structure (11 sections, 3 equations, 11 figures, 4 tables)

This paper contains 11 sections, 3 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: We introduce a memory-efficient neural 3D reconstruction method tailored to work with short egocentric omnidirectional video inputs. The geometry is estimated using a signed distance field and a novel adaptive spherical binoctree data structure subdivided through iterative optimization. We show that our method outperforms other state-of-the-art 3D reconstruction methods in balancing detail and memory cost schonberger2016structurejang2022egocentricYu2022SDFStudio.
  • Figure 2: (a) Cuboid approximation of a sphoxel for intersection testing. Ray-triangle intersection tests occur for all triangles that comprise cuboid faces. (b) Illustration of rubrics for surface detection in sphoxels. A surface exists in a sphoxel if $\|d\| < R$.
  • Figure 3: Sampling strategies on a unit sphere. (a) Sphere sampling range for the exocentric inward-looking model. (b)--(d) Sampling for egocentric data using a coarse-to-fine binoctree.
  • Figure 4: Adaptive sampling sphoxel bounds. Left: Coarse (blue) sphoxel vertices for sampling. Center: A sample of points from coarse (top) and fine (bottom) sphoxel intersections. Right: Fine (green) sphoxel vertices, which are denser near the surface.
  • Figure 5: We compare our method with traditional and neural methods using ground-truth geometry. We present qualitative results of NeuS wang2021neus, NeuS-facto Yu2022SDFStudio and Neuralangelo li2023neuralangelo here; our method produce higher-quality 3D geometry. Please refer to Table \ref{['tab:eval_synthetic']} for complementary quantitative evaluation and to supplemental material for further qualitative comparisons of omitted traditional methods, including COLMAP schonberger2016structure, and EgocentricRecon jang2022egocentric.
  • ...and 6 more figures