Table of Contents
Fetching ...

Holistic Inverse Rendering of Complex Facade via Aerial 3D Scanning

Zixuan Xie, Rengan Xie, Rong Li, Kai Huang, Pengju Qiao, Jingsen Zhu, Xu Yin, Qi Ye, Wei Hua, Yuchi Huo, Hujun Bao

TL;DR

The paper tackles holistic inverse rendering of outdoor facades from multi-view aerial imagery, aiming to recover geometry, lighting, and material properties for photorealistic novel-view rendering and editing. It introduces a two-stage pipeline that uses neural implicit signed distance fields with multi-resolution grids to model geometry and separate diffuse and specular appearance, followed by a material decomposition framework aided by adaptive SAM-based segmentation and a frequency-aware SDF regularization. Lighting is captured with an analytic daylight model and a differentiable Monte Carlo renderer, enabling efficient and physically-based relighting in outdoor scenes. A drone-derived facade dataset with LiDAR ground truth is provided for training and benchmarking, and experiments show state-of-the-art performance in geometry, material fidelity, relighting, and editing compared to baselines. The combination of geometry- and material-aware neural representations, semantic priors, and analytical lighting yields robust reconstruction and photorealistic rendering for large-scale facades in outdoor environments.

Abstract

In this work, we use multi-view aerial images to reconstruct the geometry, lighting, and material of facades using neural signed distance fields (SDFs). Without the requirement of complex equipment, our method only takes simple RGB images captured by a drone as inputs to enable physically based and photorealistic novel-view rendering, relighting, and editing. However, a real-world facade usually has complex appearances ranging from diffuse rocks with subtle details to large-area glass windows with specular reflections, making it hard to attend to everything. As a result, previous methods can preserve the geometry details but fail to reconstruct smooth glass windows or verse vise. In order to address this challenge, we introduce three spatial- and semantic-adaptive optimization strategies, including a semantic regularization approach based on zero-shot segmentation techniques to improve material consistency, a frequency-aware geometry regularization to balance surface smoothness and details in different surfaces, and a visibility probe-based scheme to enable efficient modeling of the local lighting in large-scale outdoor environments. In addition, we capture a real-world facade aerial 3D scanning image set and corresponding point clouds for training and benchmarking. The experiment demonstrates the superior quality of our method on facade holistic inverse rendering, novel view synthesis, and scene editing compared to state-of-the-art baselines.

Holistic Inverse Rendering of Complex Facade via Aerial 3D Scanning

TL;DR

The paper tackles holistic inverse rendering of outdoor facades from multi-view aerial imagery, aiming to recover geometry, lighting, and material properties for photorealistic novel-view rendering and editing. It introduces a two-stage pipeline that uses neural implicit signed distance fields with multi-resolution grids to model geometry and separate diffuse and specular appearance, followed by a material decomposition framework aided by adaptive SAM-based segmentation and a frequency-aware SDF regularization. Lighting is captured with an analytic daylight model and a differentiable Monte Carlo renderer, enabling efficient and physically-based relighting in outdoor scenes. A drone-derived facade dataset with LiDAR ground truth is provided for training and benchmarking, and experiments show state-of-the-art performance in geometry, material fidelity, relighting, and editing compared to baselines. The combination of geometry- and material-aware neural representations, semantic priors, and analytical lighting yields robust reconstruction and photorealistic rendering for large-scale facades in outdoor environments.

Abstract

In this work, we use multi-view aerial images to reconstruct the geometry, lighting, and material of facades using neural signed distance fields (SDFs). Without the requirement of complex equipment, our method only takes simple RGB images captured by a drone as inputs to enable physically based and photorealistic novel-view rendering, relighting, and editing. However, a real-world facade usually has complex appearances ranging from diffuse rocks with subtle details to large-area glass windows with specular reflections, making it hard to attend to everything. As a result, previous methods can preserve the geometry details but fail to reconstruct smooth glass windows or verse vise. In order to address this challenge, we introduce three spatial- and semantic-adaptive optimization strategies, including a semantic regularization approach based on zero-shot segmentation techniques to improve material consistency, a frequency-aware geometry regularization to balance surface smoothness and details in different surfaces, and a visibility probe-based scheme to enable efficient modeling of the local lighting in large-scale outdoor environments. In addition, we capture a real-world facade aerial 3D scanning image set and corresponding point clouds for training and benchmarking. The experiment demonstrates the superior quality of our method on facade holistic inverse rendering, novel view synthesis, and scene editing compared to state-of-the-art baselines.
Paper Structure (23 sections, 14 equations, 9 figures, 1 table)

This paper contains 23 sections, 14 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: We design a neural rendering pipeline that enables the holistic inverse rendering of facades from aerial images, providing high-quality geometry and material for novel view synthesis, relighting, and editing for downstream applications.
  • Figure 2: Overview of our framework. Our method takes as input aerial multi-view images and reconstructs the full 3d facade containing geometry and material properties in two stages (two dotted lines). In the first stage, using volumetric rendering, we optimize specular color field $F_s$, diffuse color field $F_c$, and geometry network $F_s$ end-to-end. In the second stage, we decompose light and material by jointly optimizing the analytic daylight model and material field with a differentiable Monte Carlo render layer.
  • Figure 3: Illustration of the SAM loss. A 2D semantic instance is projected to other views to match its correspondence instance, and then the material properties are regularized to the cluster centers in the same instance.
  • Figure 4: The incident radiance from the sun received by the building from any incident direction $\omega_i$. More discussion in Section \ref{['sec:sunlight']}
  • Figure 5: Comparison on two real-world scenes with TensoIR, Nvdiffrec, and NERF-OSR. We visualize the novel-view results and reconstructed normals. Note that the backgrounds and the trees overlapping the buildings are not our reconstruction targets and are not covered by the scanning.
  • ...and 4 more figures