Table of Contents
Fetching ...

Multi-view Remote Sensing Image Segmentation With SAM priors

Zipeng Qi, Chenyang Liu, Zili Liu, Hao Chen, Yongchang Wu, Zhengxia Zou, Zhenwei Sh

TL;DR

This work tackles multi-view segmentation in remote sensing under very limited annotations. It introduces a two-stage approach that first builds an Implicit Neural Field (INF) for scene geometry and appearance, then transfers color information into semantic attributes by incorporating Segment Anything (SAM) priors via a transformer and by generating pseudo-labels for unseen views. The main contributions are (i) leveraging SAM-derived pseudo-labels to provide scene-wide semantic supervision, (ii) integrating SAM features into the INF to enrich semantic information, and (iii) demonstrating improved performance over CNN- and INF-based baselines, particularly for views distant from the training set. The findings suggest that SAM priors can effectively supplement INF-based RS segmentation, enabling better cross-view consistency with limited labeled data and enabling scalable analysis of large RS scenes.

Abstract

Multi-view segmentation in Remote Sensing (RS) seeks to segment images from diverse perspectives within a scene. Recent methods leverage 3D information extracted from an Implicit Neural Field (INF), bolstering result consistency across multiple views while using limited accounts of labels (even within 3-5 labels) to streamline labor. Nonetheless, achieving superior performance within the constraints of limited-view labels remains challenging due to inadequate scene-wide supervision and insufficient semantic features within the INF. To address these. we propose to inject the prior of the visual foundation model-Segment Anything(SAM), to the INF to obtain better results under the limited number of training data. Specifically, we contrast SAM features between testing and training views to derive pseudo labels for each testing view, augmenting scene-wide labeling information. Subsequently, we introduce SAM features via a transformer into the INF of the scene, supplementing the semantic information. The experimental results demonstrate that our method outperforms the mainstream method, confirming the efficacy of SAM as a supplement to the INF for this task.

Multi-view Remote Sensing Image Segmentation With SAM priors

TL;DR

This work tackles multi-view segmentation in remote sensing under very limited annotations. It introduces a two-stage approach that first builds an Implicit Neural Field (INF) for scene geometry and appearance, then transfers color information into semantic attributes by incorporating Segment Anything (SAM) priors via a transformer and by generating pseudo-labels for unseen views. The main contributions are (i) leveraging SAM-derived pseudo-labels to provide scene-wide semantic supervision, (ii) integrating SAM features into the INF to enrich semantic information, and (iii) demonstrating improved performance over CNN- and INF-based baselines, particularly for views distant from the training set. The findings suggest that SAM priors can effectively supplement INF-based RS segmentation, enabling better cross-view consistency with limited labeled data and enabling scalable analysis of large RS scenes.

Abstract

Multi-view segmentation in Remote Sensing (RS) seeks to segment images from diverse perspectives within a scene. Recent methods leverage 3D information extracted from an Implicit Neural Field (INF), bolstering result consistency across multiple views while using limited accounts of labels (even within 3-5 labels) to streamline labor. Nonetheless, achieving superior performance within the constraints of limited-view labels remains challenging due to inadequate scene-wide supervision and insufficient semantic features within the INF. To address these. we propose to inject the prior of the visual foundation model-Segment Anything(SAM), to the INF to obtain better results under the limited number of training data. Specifically, we contrast SAM features between testing and training views to derive pseudo labels for each testing view, augmenting scene-wide labeling information. Subsequently, we introduce SAM features via a transformer into the INF of the scene, supplementing the semantic information. The experimental results demonstrate that our method outperforms the mainstream method, confirming the efficacy of SAM as a supplement to the INF for this task.
Paper Structure (9 sections, 6 equations, 2 figures, 1 table)

This paper contains 9 sections, 6 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Our proposed method consists of two stages. Initially, we employ two MLPs to build the scene's INF, encoding 3D information in the density attributes of each spatial point, supervised by all RGB images. Subsequently, we freeze the density MLP and incorporate SAM priors into the INF. This involves transferring the colour attribute by introducing pseudo-labels as additional supervision and injecting SAM features via a transformer.
  • Figure 2: The images on the left showcase results in proximity to the training views, while those on the right depict regions distant from the training views.