Table of Contents
Fetching ...

InfNeRF: Towards Infinite Scale NeRF Rendering with O(log n) Space Complexity

Jiabin Liang, Lanqing Zhang, Zhuoran Zhao, Xiangyu Xu

TL;DR

InfNeRF introduces an octree-based Level-of-Detail framework for NeRFs to render large-scale scenes with $O(\log n)$ memory during rendering while keeping NeRF fidelity. Each octree node hosts a NeRF and sampling points are routed to the level that matches their spatial scale, enabling efficient memory access and reduced aliasing through hierarchical smoothing. It adds a SfM-driven pruning strategy to obtain a compact, adaptive tree and a distributed training approach that preserves $O(n)$ training complexity, allowing scalable, multi-device training. Empirical results on four large urban drone datasets and the MipNeRF 360 garden scene show strong memory efficiency and improved multi-resolution rendering, including up to $2.4$ dB PSNR gains over baselines. The approach is versatile, compatible with various NeRF backbones, and sets a foundation for scalable large-scale neural scene representations with LoD octrees.

Abstract

The conventional mesh-based Level of Detail (LoD) technique, exemplified by applications such as Google Earth and many game engines, exhibits the capability to holistically represent a large scene even the Earth, and achieves rendering with a space complexity of O(log n). This constrained data requirement not only enhances rendering efficiency but also facilitates dynamic data fetching, thereby enabling a seamless 3D navigation experience for users. In this work, we extend this proven LoD technique to Neural Radiance Fields (NeRF) by introducing an octree structure to represent the scenes in different scales. This innovative approach provides a mathematically simple and elegant representation with a rendering space complexity of O(log n), aligned with the efficiency of mesh-based LoD techniques. We also present a novel training strategy that maintains a complexity of O(n). This strategy allows for parallel training with minimal overhead, ensuring the scalability and efficiency of our proposed method. Our contribution is not only in extending the capabilities of existing techniques but also in establishing a foundation for scalable and efficient large-scale scene representation using NeRF and octree structures.

InfNeRF: Towards Infinite Scale NeRF Rendering with O(log n) Space Complexity

TL;DR

InfNeRF introduces an octree-based Level-of-Detail framework for NeRFs to render large-scale scenes with memory during rendering while keeping NeRF fidelity. Each octree node hosts a NeRF and sampling points are routed to the level that matches their spatial scale, enabling efficient memory access and reduced aliasing through hierarchical smoothing. It adds a SfM-driven pruning strategy to obtain a compact, adaptive tree and a distributed training approach that preserves training complexity, allowing scalable, multi-device training. Empirical results on four large urban drone datasets and the MipNeRF 360 garden scene show strong memory efficiency and improved multi-resolution rendering, including up to dB PSNR gains over baselines. The approach is versatile, compatible with various NeRF backbones, and sets a foundation for scalable large-scale neural scene representations with LoD octrees.

Abstract

The conventional mesh-based Level of Detail (LoD) technique, exemplified by applications such as Google Earth and many game engines, exhibits the capability to holistically represent a large scene even the Earth, and achieves rendering with a space complexity of O(log n). This constrained data requirement not only enhances rendering efficiency but also facilitates dynamic data fetching, thereby enabling a seamless 3D navigation experience for users. In this work, we extend this proven LoD technique to Neural Radiance Fields (NeRF) by introducing an octree structure to represent the scenes in different scales. This innovative approach provides a mathematically simple and elegant representation with a rendering space complexity of O(log n), aligned with the efficiency of mesh-based LoD techniques. We also present a novel training strategy that maintains a complexity of O(n). This strategy allows for parallel training with minimal overhead, ensuring the scalability and efficiency of our proposed method. Our contribution is not only in extending the capabilities of existing techniques but also in establishing a foundation for scalable and efficient large-scale scene representation using NeRF and octree structures.
Paper Structure (28 sections, 6 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 28 sections, 6 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: Each sampling sphere along the ray is assigned a radius corresponding to its pixel size. The radius is proportional to depth. Depending on the radius, the sampling spheres will be sent to different nodes of the octree to query their densities and colors. The densities and colors are integrated using volume rendering to produce the final color for the pixel.
  • Figure 2: As illustrated in (a), when zooming out, only the root node is required. In (b), when zooming in, only the leaf node is required. In (c), when looking at the horizon, approximately $\mathcal{O}(\log n)$ nodes are required which is the upper bound of InfNeRF. In (d), in contrast, other methods require all the blocks when zoom out resulting in an upper bound of $\mathcal{O}(n)$.
  • Figure 3: Illustration of the distributed octree as a binary tree for simplicity. The tree can be divided into the root and 2 branches. The root is shared by all devices like Conventional DDP. And each branch is owned by one device without sharing the weights. When a sampling sphere descends to a pruned node, its density and color will be estimated by the shared parent node.
  • Figure 4: We render a video starting from a close-up view and zooming out to the full scene and report the size of all sub NeRFs required for rendering each frame. Thanks to the LoD tree structure, InfNeRF demonstrated a much smaller memory footprint.
  • Figure 5: The PSNR for full resolution and across 6 resolutions of 4 methods are presented in the above plots. InfNeRF is superior to other methods in both dimensions.
  • ...and 4 more figures