Table of Contents
Fetching ...

Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation

Zipeng Qi, Hao Chen, Haotian Zhang, Zhengxia Zou, Zhenwei Shi

TL;DR

The paper targets efficient, accurate multi-view semantic segmentation for remote sensing under sparse labeling by extending Gaussian Splatting to semantic attributes. It combines explicit point-cloud splatting with a one-time rendering pipeline, SAM2-based boundary pseudo-labels, and 2D/3D aggregation losses to enhance view consistency and spatial continuity. Empirical results on CARLA-based synthetic data and Google Maps real data show superior accuracy and dramatically lower latency compared with training- and optimization-based baselines, validating practical applicability in real-world remote sensing pipelines. The approach offers a scalable, label-efficient path toward high-quality multi-view scene understanding and downstream 3D reconstructions.

Abstract

In this paper, we propose a novel semantic splatting approach based on Gaussian Splatting to achieve efficient and low-latency. Our method projects the RGB attributes and semantic features of point clouds onto the image plane, simultaneously rendering RGB images and semantic segmentation results. Leveraging the explicit structure of point clouds and a one-time rendering strategy, our approach significantly enhances efficiency during optimization and rendering. Additionally, we employ SAM2 to generate pseudo-labels for boundary regions, which often lack sufficient supervision, and introduce two-level aggregation losses at the 2D feature map and 3D spatial levels to improve the view-consistent and spatial continuity.

Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation

TL;DR

The paper targets efficient, accurate multi-view semantic segmentation for remote sensing under sparse labeling by extending Gaussian Splatting to semantic attributes. It combines explicit point-cloud splatting with a one-time rendering pipeline, SAM2-based boundary pseudo-labels, and 2D/3D aggregation losses to enhance view consistency and spatial continuity. Empirical results on CARLA-based synthetic data and Google Maps real data show superior accuracy and dramatically lower latency compared with training- and optimization-based baselines, validating practical applicability in real-world remote sensing pipelines. The approach offers a scalable, label-efficient path toward high-quality multi-view scene understanding and downstream 3D reconstructions.

Abstract

In this paper, we propose a novel semantic splatting approach based on Gaussian Splatting to achieve efficient and low-latency. Our method projects the RGB attributes and semantic features of point clouds onto the image plane, simultaneously rendering RGB images and semantic segmentation results. Leveraging the explicit structure of point clouds and a one-time rendering strategy, our approach significantly enhances efficiency during optimization and rendering. Additionally, we employ SAM2 to generate pseudo-labels for boundary regions, which often lack sufficient supervision, and introduce two-level aggregation losses at the 2D feature map and 3D spatial levels to improve the view-consistent and spatial continuity.

Paper Structure

This paper contains 21 sections, 13 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Our method achieves superior efficiency and accuracy for multi-view segmentation using limited labels. Training-based methods: SegNet, SETR, DeepLab, Uent. Optimization-based methods: Sem-NeRF, Color-NeRF, IRT, Ours.
  • Figure 2: The overview of the proposed model. Our method is an optimization-based semantic splatting approach for multi-view segmentation in remote sensing. Leveraging an explicit point cloud structure and volume rendering, it achieves high accuracy and low latency in generating RGB and semantic segmentation results(see Section \ref{['sec:CR']} and \ref{['sec:SR']}). Aggregation losses improve spatial generalization(see Section \ref{['sec:AL']}), while SAM2 generates pseudo-labels for boundary regions, further enhancing boundary segmentation quality (see Section \ref{['sec:SPL']}).
  • Figure 3: For a target scene with only a few views, e.g., 3 views with supervision labels, some regions will lack supervision. Large circle: The common region covered by supervised views. Small circle: The boundary regions lacking supervision.
  • Figure 4: We present input samples of four sub-datasets. Left: Six sampled input RGB images. Right: Sparse semantic labels, with real scenes having only two labeled views. The type includes from area level (SYS 1 and REAL 1) to local building level(SYS5 and REAL 2).
  • Figure 5: The visual results from sys #1, sys #3 and sys #5. The results show that our method achieves more accuracy and outperform other training-based and optimization-based methods in multi-view segmentation taks for both complex synthesis and real scenes.
  • ...and 4 more figures