Table of Contents
Fetching ...

Semantic Neural Radiance Fields for Multi-Date Satellite Data

Valentin Wagner, Sebastian Bullinger, Christoph Bodensteiner, Michael Arens

TL;DR

This work addresses the challenge of building coherent 3D semantic representations from multi-date satellite imagery where labels may be noisy and scenes include transient objects. It introduces a satellite-domain Semantic NeRF that jointly learns color and a 3D semantic field by extending a domain-adapted NeRF with a semantic head and a transient-robust training regime, including an RPC camera model and irradiance-based lighting. Key contributions include (i) a dual-modality NeRF architecture, (ii) improved color stability for transient areas, (iii) demonstration of robustness via multi-view consistency, (iv) a publicly released 71-image, 4-scene, 5-class dataset, and (v) open-source code. The approach achieves semantic accuracy >90% on test views, significantly reduces transient artifacts through targeted regularization, and benefits from multi-view consistency to denoise and complete semantic labels, enabling practical 3D semantic mapping from satellite data.

Abstract

In this work we propose a satellite specific Neural Radiance Fields (NeRF) model capable to obtain a three-dimensional semantic representation (neural semantic field) of the scene. The model derives the output from a set of multi-date satellite images with corresponding pixel-wise semantic labels. We demonstrate the robustness of our approach and its capability to improve noisy input labels. We enhance the color prediction by utilizing the semantic information to address temporal image inconsistencies caused by non-stationary categories such as vehicles. To facilitate further research in this domain, we present a dataset comprising manually generated labels for popular multi-view satellite images. Our code and dataset are available at https://github.com/wagnva/semantic-nerf-for-satellite-data.

Semantic Neural Radiance Fields for Multi-Date Satellite Data

TL;DR

This work addresses the challenge of building coherent 3D semantic representations from multi-date satellite imagery where labels may be noisy and scenes include transient objects. It introduces a satellite-domain Semantic NeRF that jointly learns color and a 3D semantic field by extending a domain-adapted NeRF with a semantic head and a transient-robust training regime, including an RPC camera model and irradiance-based lighting. Key contributions include (i) a dual-modality NeRF architecture, (ii) improved color stability for transient areas, (iii) demonstration of robustness via multi-view consistency, (iv) a publicly released 71-image, 4-scene, 5-class dataset, and (v) open-source code. The approach achieves semantic accuracy >90% on test views, significantly reduces transient artifacts through targeted regularization, and benefits from multi-view consistency to denoise and complete semantic labels, enabling practical 3D semantic mapping from satellite data.

Abstract

In this work we propose a satellite specific Neural Radiance Fields (NeRF) model capable to obtain a three-dimensional semantic representation (neural semantic field) of the scene. The model derives the output from a set of multi-date satellite images with corresponding pixel-wise semantic labels. We demonstrate the robustness of our approach and its capability to improve noisy input labels. We enhance the color prediction by utilizing the semantic information to address temporal image inconsistencies caused by non-stationary categories such as vehicles. To facilitate further research in this domain, we present a dataset comprising manually generated labels for popular multi-view satellite images. Our code and dataset are available at https://github.com/wagnva/semantic-nerf-for-satellite-data.

Paper Structure

This paper contains 17 sections, 13 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Example output of our proposed satellite-domain-adapted semantic NeRF model. Fusion of RGB and semantics allows to render the scene in two distinct modalities for novel, during training unseen views. The semantic visualization is enriched with a three-dimensional structure through combination with a learned lighting component.
  • Figure 2: Overview of our proposed model. The satellite-domain-adapted outputs (i.e. elements in the blue area) are combined using an irradiance lighting model to produce the color rendering as originally proposed by SatNeRFsatnerf. Using an additional semantic head (i.e. elements in the red area) our proposed method is able to produce a corresponding semantic pixel-wise labeling. We combine this with the learned lighting scalar to create a three-dimensional semantic visualization. We introduce a transient regularization loss $L_t$ to reduce artifacts in the learned appearance based on the semantic input data.
  • Figure 3: Qualitative results of our proposed NeRF model for the popular scenes of the JAX dataset. The pixel-wise semantic annotations in (b) are part of our dataset. The domain-adapted NeRF model is able to learn and reproduce a representation of the scene containing color and semantic information. By combining the predicted semantic class with the learned lighting component we enrich the visualization with three-dimensional depth cues.
  • Figure 4: Qualitative comparison of the proposed transient regularization loss. Using this loss, our model is able to substantially reduce blurry artifacts by guiding the uncertainty $\beta$ on locations of transient objects. Shown here are closeups of the rendered color image and visualization of the uncertainty scalar $\beta$ for training views of the scenes JAX_068 and JAX_214. The Baseline SatNeRFsatnerf model is unable to filter the cars out, leading to visible remnants. Our method is able to render a static, transient free representation of the scene.
  • Figure 5: Effect of the multi-view consistency on local segmentation errors. The semantic information across all training views is merged in order to improve accuracy. Using a $\mathit{sigmoid}$ activation function as normalization decreases noise in the reconstructed segmentation.