Table of Contents
Fetching ...

DistGrid: Scalable Scene Reconstruction with Distributed Multi-resolution Hash Grid

Sidun Liu, Peng Qiao, Zongxin Ye, Wenyu Li, Yong Dou

TL;DR

This work tackles the difficulty of reconstructing large-scale scenes with memory- and capacity-constrained NeRF approaches. It introduces DistGrid, a distributed system that jointly trains deformable multi-resolution hash grids across non-overlapping AABBs, coupled with segmented volume rendering to handle cross-region rays without background NeRFs. Key contributions include deformable bounding boxes for scalable partitioning, a two-level coarse-fine partitioning strategy, and a segmentation-based rendering and training pipeline that enables efficient multi-GPU learning. Experiments on four large urban datasets show DistGrid yields higher fidelity and boundary-consistent reconstructions compared to state-of-the-art methods, highlighting its practical appeal for drone-captured large-scale scenes.

Abstract

Neural Radiance Field~(NeRF) achieves extremely high quality in object-scaled and indoor scene reconstruction. However, there exist some challenges when reconstructing large-scale scenes. MLP-based NeRFs suffer from limited network capacity, while volume-based NeRFs are heavily memory-consuming when the scene resolution increases. Recent approaches propose to geographically partition the scene and learn each sub-region using an individual NeRF. Such partitioning strategies help volume-based NeRF exceed the single GPU memory limit and scale to larger scenes. However, this approach requires multiple background NeRF to handle out-of-partition rays, which leads to redundancy of learning. Inspired by the fact that the background of current partition is the foreground of adjacent partition, we propose a scalable scene reconstruction method based on joint Multi-resolution Hash Grids, named DistGrid. In this method, the scene is divided into multiple closely-paved yet non-overlapped Axis-Aligned Bounding Boxes, and a novel segmented volume rendering method is proposed to handle cross-boundary rays, thereby eliminating the need for background NeRFs. The experiments demonstrate that our method outperforms existing methods on all evaluated large-scale scenes, and provides visually plausible scene reconstruction. The scalability of our method on reconstruction quality is further evaluated qualitatively and quantitatively.

DistGrid: Scalable Scene Reconstruction with Distributed Multi-resolution Hash Grid

TL;DR

This work tackles the difficulty of reconstructing large-scale scenes with memory- and capacity-constrained NeRF approaches. It introduces DistGrid, a distributed system that jointly trains deformable multi-resolution hash grids across non-overlapping AABBs, coupled with segmented volume rendering to handle cross-region rays without background NeRFs. Key contributions include deformable bounding boxes for scalable partitioning, a two-level coarse-fine partitioning strategy, and a segmentation-based rendering and training pipeline that enables efficient multi-GPU learning. Experiments on four large urban datasets show DistGrid yields higher fidelity and boundary-consistent reconstructions compared to state-of-the-art methods, highlighting its practical appeal for drone-captured large-scale scenes.

Abstract

Neural Radiance Field~(NeRF) achieves extremely high quality in object-scaled and indoor scene reconstruction. However, there exist some challenges when reconstructing large-scale scenes. MLP-based NeRFs suffer from limited network capacity, while volume-based NeRFs are heavily memory-consuming when the scene resolution increases. Recent approaches propose to geographically partition the scene and learn each sub-region using an individual NeRF. Such partitioning strategies help volume-based NeRF exceed the single GPU memory limit and scale to larger scenes. However, this approach requires multiple background NeRF to handle out-of-partition rays, which leads to redundancy of learning. Inspired by the fact that the background of current partition is the foreground of adjacent partition, we propose a scalable scene reconstruction method based on joint Multi-resolution Hash Grids, named DistGrid. In this method, the scene is divided into multiple closely-paved yet non-overlapped Axis-Aligned Bounding Boxes, and a novel segmented volume rendering method is proposed to handle cross-boundary rays, thereby eliminating the need for background NeRFs. The experiments demonstrate that our method outperforms existing methods on all evaluated large-scale scenes, and provides visually plausible scene reconstruction. The scalability of our method on reconstruction quality is further evaluated qualitatively and quantitatively.
Paper Structure (24 sections, 11 equations, 6 figures, 2 tables)

This paper contains 24 sections, 11 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: This figure illustrates the segmented volume rendering process, taking 4-partition as an example. Cross-region rays are rendered locally and merged globally. ① The scene is partitioned into 4 spatial regions, with each region reconstructed by a sub-NeRF model. ② Input rays are first tested using Ray-Intersection with AABBs to obtain its start, end, and cross-boundary moments ($t_0\sim t_3$). ③ Rays are then segmented and distributed to corresponding sub-NeRF models. ④ Local volume rendering is performed to obtain local transmittance and color. ⑤ The local results are scattered in the group of sub-NeRF models that the ray intersects. ⑥ Finally, based on the segmented volume rendering equations (\ref{['eq:color_fwd']}) and (\ref{['eq:transmittance_fwd']}), the gathered results are merged. Note that only the locally computed transmittance and color (marked with $[\cdot]$) require gradients in back-propagation.
  • Figure 2: Deformable Multi-resolution Hash Grid. Instant NGP uses a cubic bounding box (left) to wrap the scene, leading to an unnecessary sampling of high-altitude and underground areas. Logical implementation where altitude range is limited (center) still causes additional memory usage. Therefore, the cubic bounding box is extended so that it has an arbitrary aspect ratio (right).
  • Figure 3: Two partitioning types described in sec. \ref{['sec:Partition']}. All cameras' FOVs are projected onto the ground plane. Brighter regions are covered by more cameras. In coarse-fine partitioning, fine-level box wraps the region covered by most cameras, while coarse-level box wraps the region covered by any of these cameras. With region partitioning, original scene (\ref{['fig:partition1']}) can be partitioned into 2 regions (\ref{['fig:partition2']}) or 4 regions (\ref{['fig:partition4']}).
  • Figure 4: Quantitative results on Rubble with 1, 2, and 4 partitions. The hash table length ranged from $2^{19}$ to $2^{24}$.
  • Figure 5: Qualitative evaluation on DistGrid, compared to NeRF and TensoRF.
  • ...and 1 more figures