Aerial-NeRF: Adaptive Spatial Partitioning and Sampling for Large-Scale Aerial Rendering

Xiaohan Zhang; Yukui Qiu; Zhenyu Sun; Qi Liu

Aerial-NeRF: Adaptive Spatial Partitioning and Sampling for Large-Scale Aerial Rendering

Xiaohan Zhang, Yukui Qiu, Zhenyu Sun, Qi Liu

TL;DR

Aerial-NeRF tackles the challenge of large-scale aerial view synthesis by introducing adaptive spatial partitioning based on drone poses, pose-similarity driven region selection for rendering, and adaptive sampling to cover buildings at varying heights. The method trains region-specific NeRFs on a single GPU, enabling fast interactive fly-throughs with substantially reduced sampling and memory requirements. Key contributions include the pose-based partitioning and selection framework, a robust sampling strategy for unbounded aerial spaces, and the SCUTic dataset for uneven drone trajectories; results show near 4× rendering speedups and state-of-the-art quality on multiple large-scale aerial benchmarks. The approach significantly enhances practicality for real-time aerial scene visualization and 3D reconstruction in large environments, with publicly released data and code.

Abstract

Recent progress in large-scale scene rendering has yielded Neural Radiance Fields (NeRF)-based models with an impressive ability to synthesize scenes across small objects and indoor scenes. Nevertheless, extending this idea to large-scale aerial rendering poses two critical problems. Firstly, a single NeRF cannot render the entire scene with high-precision for complex large-scale aerial datasets since the sampling range along each view ray is insufficient to cover buildings adequately. Secondly, traditional NeRFs are infeasible to train on one GPU to enable interactive fly-throughs for modeling massive images. Instead, existing methods typically separate the whole scene into multiple regions and train a NeRF on each region, which are unaccustomed to different flight trajectories and difficult to achieve fast rendering. To that end, we propose Aerial-NeRF with three innovative modifications for jointly adapting NeRF in large-scale aerial rendering: (1) Designing an adaptive spatial partitioning and selection method based on drones' poses to adapt different flight trajectories; (2) Using similarity of poses instead of (expert) network for rendering speedup to determine which region a new viewpoint belongs to; (3) Developing an adaptive sampling approach for rendering performance improvement to cover the entire buildings at different heights. Extensive experiments have conducted to verify the effectiveness and efficiency of Aerial-NeRF, and new state-of-the-art results have been achieved on two public large-scale aerial datasets and presented SCUTic dataset. Note that our model allows us to perform rendering over 4 times as fast as compared to multiple competitors. Our dataset, code, and model are publicly available at https://drliuqi.github.io/.

Aerial-NeRF: Adaptive Spatial Partitioning and Sampling for Large-Scale Aerial Rendering

TL;DR

Abstract

Paper Structure (14 sections, 15 equations, 14 figures, 5 tables)

This paper contains 14 sections, 15 equations, 14 figures, 5 tables.

Introduction
Related Work
NeRF for General Outdoor Scenes
NeRF for Large-Scale Aerial Scenes
Method
Neural Radiance Field
Spatial Partitioning and Selection
Adaptive Sampling
Experiments
Datasets
Metrics and Settings
Results
Ablation Studies
Conclusion

Figures (14)

Figure 1: Comparison of different methods for rendering new viewpoints. For a sampling point on the view ray, Mega-NeRF turki2022mega uses NeRFs of all regions traversed by this ray to calculate the color and density of this sampling point, resulting in a plodding rendering speed. Switch-NeRF zhenxing2022switch applies an expert network to determine which region a sampling point belongs to and applies the corresponding NeRF to calculate its color and density, thereby improving the rendering speed. Our method creatively utilizes existing camera poses to match the region and the new viewpoint, which speeds up rendering. Moreover, ours is more robust for different aerial photography trajectories to achieve higher PSNR.
Figure 2: Comparison between different sampling approaches. "Near" represents sampling origin along each view ray, and "Far" denotes the sampling end point. The previous sampling method sets the sampling range along each ray as a hyperparameter, resulting in a significant waste of sampling points in the air and cannot cover the entire buildings in the sampling range.
Figure 3: The pipeline of our method. We propose an adaptive spacial partitioning and selection method that makes our method applicable to aerial datasets of different trajectories. (a) (b) divide the entire scene into multiple areas based on the poses of drones. Next, we select cameras that can observe $l$-th region, and use these cameras to train the NeRF of $l$-th region. (c) selects the cameras in $l$-th region. (d) selects the boundary cameras that can see the $l$-th region. (e) utilizes the boundary cameras to select more cameras that can view the $l$-th region. (f) samples between $H_1$ and $H_2$ in bounded space, and samples on buildings to infinity in unbounded space. (g) is the condition for determining whether the boundary cameras belongs to the $l$-th region. When the distance $dis$ and angle $\theta$ are within the threshold, it indicates that this boundary camera can observe the $l$-th region. (h) is to determine whether the non-boundary cameras outside the $l$-th region belongs to this region. We use the similarity equation to find cameras in other regions that are similar to the boundary camera of the $l$-th region, which indicates that these cameras can also observe $l$-th region.
Figure 4: Visualization of our spatial selection strategy. The algorithm's input is the pose of a new viewpoint, and the output is the rendering image of this viewpoint. In each region, we find the $n_s=5$ cameras with the smallest $S$ (calculated by (7)) relative to this viewpoint. The smaller the $S$, the higher the similarity between cameras, and the more common view areas there are. For instance, the scene viewed from the new viewpoint is almost identical to that captured by a camera with $S=1.26$. To avoid randomness, taking the average of $n_s=5$ cameras' $S$ as the region similarity error $\overline{S}$ between this viewpoint and each region. When $\overline{S}$ is small, this camera belongs to this region, and the NeRF of this region is used to render this viewpoint. As can be seen from the images, the smaller the $\overline{S}$, the smaller the difference between images of the new viewpoint and this region's cameras, indicating that the new viewpoint belongs to this space.
Figure 5: Sampling strategy on bounded regions. The intersection of the drone's ray with the outer sphere $near$ is the starting point for sampling, and the intersection with the Earth $far$ is the ending point for sampling. Each drone's ray is sampled to cover buildings on the ground.
...and 9 more figures

Aerial-NeRF: Adaptive Spatial Partitioning and Sampling for Large-Scale Aerial Rendering

TL;DR

Abstract

Aerial-NeRF: Adaptive Spatial Partitioning and Sampling for Large-Scale Aerial Rendering

Authors

TL;DR

Abstract

Table of Contents

Figures (14)