Table of Contents
Fetching ...

Global-guided Focal Neural Radiance Field for Large-scale Scene Rendering

Mingqi Shao, Feng Xiong, Hang Zhang, Shuang Yang, Mu Xu, Wei Bian, Xueqian Wang

TL;DR

GF-NeRF addresses the challenge of rendering large-scale scenes with NeRF by introducing a two-stage Global-Focal architecture that leverages a global coarse representation to guide per-block refinement. The global stage provides a continuous scene prior via a fixed global hash encoder, while the focal stage learns block-specific residual features with per-block encoders, fused through a shared decoder to maintain global consistency. A global-guided training strategy, including weighted pixel sampling based on errors from the global stage, focuses learning on high-frequency details without sacrificing scene-wide coherence, demonstrated on aerial and street datasets with superior fidelity over prior block-based methods. This approach enables scalable, adaptable large-scale rendering without requiring scene priors, balancing capacity expansion with cross-block consistency and practical rendering quality.

Abstract

Neural radiance fields~(NeRF) have recently been applied to render large-scale scenes. However, their limited model capacity typically results in blurred rendering results. Existing large-scale NeRFs primarily address this limitation by partitioning the scene into blocks, which are subsequently handled by separate sub-NeRFs. These sub-NeRFs, trained from scratch and processed independently, lead to inconsistencies in geometry and appearance across the scene. Consequently, the rendering quality fails to exhibit significant improvement despite the expansion of model capacity. In this work, we present global-guided focal neural radiance field (GF-NeRF) that achieves high-fidelity rendering of large-scale scenes. Our proposed GF-NeRF utilizes a two-stage (Global and Focal) architecture and a global-guided training strategy. The global stage obtains a continuous representation of the entire scene while the focal stage decomposes the scene into multiple blocks and further processes them with distinct sub-encoders. Leveraging this two-stage architecture, sub-encoders only need fine-tuning based on the global encoder, thus reducing training complexity in the focal stage while maintaining scene-wide consistency. Spatial information and error information from the global stage also benefit the sub-encoders to focus on crucial areas and effectively capture more details of large-scale scenes. Notably, our approach does not rely on any prior knowledge about the target scene, attributing GF-NeRF adaptable to various large-scale scene types, including street-view and aerial-view scenes. We demonstrate that our method achieves high-fidelity, natural rendering results on various types of large-scale datasets. Our project page: https://shaomq2187.github.io/GF-NeRF/

Global-guided Focal Neural Radiance Field for Large-scale Scene Rendering

TL;DR

GF-NeRF addresses the challenge of rendering large-scale scenes with NeRF by introducing a two-stage Global-Focal architecture that leverages a global coarse representation to guide per-block refinement. The global stage provides a continuous scene prior via a fixed global hash encoder, while the focal stage learns block-specific residual features with per-block encoders, fused through a shared decoder to maintain global consistency. A global-guided training strategy, including weighted pixel sampling based on errors from the global stage, focuses learning on high-frequency details without sacrificing scene-wide coherence, demonstrated on aerial and street datasets with superior fidelity over prior block-based methods. This approach enables scalable, adaptable large-scale rendering without requiring scene priors, balancing capacity expansion with cross-block consistency and practical rendering quality.

Abstract

Neural radiance fields~(NeRF) have recently been applied to render large-scale scenes. However, their limited model capacity typically results in blurred rendering results. Existing large-scale NeRFs primarily address this limitation by partitioning the scene into blocks, which are subsequently handled by separate sub-NeRFs. These sub-NeRFs, trained from scratch and processed independently, lead to inconsistencies in geometry and appearance across the scene. Consequently, the rendering quality fails to exhibit significant improvement despite the expansion of model capacity. In this work, we present global-guided focal neural radiance field (GF-NeRF) that achieves high-fidelity rendering of large-scale scenes. Our proposed GF-NeRF utilizes a two-stage (Global and Focal) architecture and a global-guided training strategy. The global stage obtains a continuous representation of the entire scene while the focal stage decomposes the scene into multiple blocks and further processes them with distinct sub-encoders. Leveraging this two-stage architecture, sub-encoders only need fine-tuning based on the global encoder, thus reducing training complexity in the focal stage while maintaining scene-wide consistency. Spatial information and error information from the global stage also benefit the sub-encoders to focus on crucial areas and effectively capture more details of large-scale scenes. Notably, our approach does not rely on any prior knowledge about the target scene, attributing GF-NeRF adaptable to various large-scale scene types, including street-view and aerial-view scenes. We demonstrate that our method achieves high-fidelity, natural rendering results on various types of large-scale datasets. Our project page: https://shaomq2187.github.io/GF-NeRF/
Paper Structure (16 sections, 10 equations, 7 figures, 4 tables)

This paper contains 16 sections, 10 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Different approaches of expanding NeRF capacity for large-scale scenes. (a) Expanding model capacity with independent sub-NeRFs and blending their colors through post-processing. (b) Introducing global information to guide the training of multiple blocks, expanding capacity while maintaining global consistency.
  • Figure 2: Overview of our framework. A two-stage architecture is adopted to model large-scale scenes. Global stage aims to learn target scene's coarse representation and focal stage refines high-frequency details with the guidance from global stage.
  • Figure 3: Different pixel sampling strategy. (a) Uniform pixel sampling that used in global stage. (b) Weighted pixel sampling that used in focal stage, enabling the sampling points to focus on the areas that did not perform well in global stage.
  • Figure 4: Qualitative comparison results on aerial scenes. Our method demonstrates superior detail rendering compared to Mega-NeRF and Switch-NeRF. We encourage readers to zoom in for a detailed visual comparison.
  • Figure 5: Qualitative comparison between baselines and our GF-NeRF on street scenes. F2-NeRF exhibits the most blurred renderings due to its limited model capacity. Block-NeRF manages to capture some details but still suffers from blurry results. In comparison, our method produces the highest rendering quality in street scenes.
  • ...and 2 more figures