BirdNeRF: Fast Neural Reconstruction of Large-Scale Scenes From Aerial Imagery
Huiqing Zhang, Yifei Xue, Ming Liao, Yizhen Lao
TL;DR
BirdNeRF addresses the challenge of fast, high-fidelity large-scale 3D reconstruction from aerial imagery by decomposing the scene into sub-scenes based on camera distribution and training them independently with Instant-NGP. A projection-guided re-rendering pipeline then fuses outputs from relevant sub-models to render novel views, using ground-plane geometry to index and align sub-scenes. This split-unite paradigm yields up to approximately $10\times$ faster reconstruction than Metashape and more than $50\times$ faster than current large-scale NeRF approaches on a single GPU, while preserving rendering quality. The approach demonstrates robustness across diverse urban to campus datasets and has practical impact for rapid urban modeling, disaster response, and planning applications where memory and time constraints are critical.
Abstract
In this study, we introduce BirdNeRF, an adaptation of Neural Radiance Fields (NeRF) designed specifically for reconstructing large-scale scenes using aerial imagery. Unlike previous research focused on small-scale and object-centric NeRF reconstruction, our approach addresses multiple challenges, including (1) Addressing the issue of slow training and rendering associated with large models. (2) Meeting the computational demands necessitated by modeling a substantial number of images, requiring extensive resources such as high-performance GPUs. (3) Overcoming significant artifacts and low visual fidelity commonly observed in large-scale reconstruction tasks due to limited model capacity. Specifically, we present a novel bird-view pose-based spatial decomposition algorithm that decomposes a large aerial image set into multiple small sets with appropriately sized overlaps, allowing us to train individual NeRFs of sub-scene. This decomposition approach not only decouples rendering time from the scene size but also enables rendering to scale seamlessly to arbitrarily large environments. Moreover, it allows for per-block updates of the environment, enhancing the flexibility and adaptability of the reconstruction process. Additionally, we propose a projection-guided novel view re-rendering strategy, which aids in effectively utilizing the independently trained sub-scenes to generate superior rendering results. We evaluate our approach on existing datasets as well as against our own drone footage, improving reconstruction speed by 10x over classical photogrammetry software and 50x over state-of-the-art large-scale NeRF solution, on a single GPU with similar rendering quality.
