Multi-view Remote Sensing Image Segmentation With SAM priors

Zipeng Qi; Chenyang Liu; Zili Liu; Hao Chen; Yongchang Wu; Zhengxia Zou; Zhenwei Sh

Multi-view Remote Sensing Image Segmentation With SAM priors

Zipeng Qi, Chenyang Liu, Zili Liu, Hao Chen, Yongchang Wu, Zhengxia Zou, Zhenwei Sh

TL;DR

This work tackles multi-view segmentation in remote sensing under very limited annotations. It introduces a two-stage approach that first builds an Implicit Neural Field (INF) for scene geometry and appearance, then transfers color information into semantic attributes by incorporating Segment Anything (SAM) priors via a transformer and by generating pseudo-labels for unseen views. The main contributions are (i) leveraging SAM-derived pseudo-labels to provide scene-wide semantic supervision, (ii) integrating SAM features into the INF to enrich semantic information, and (iii) demonstrating improved performance over CNN- and INF-based baselines, particularly for views distant from the training set. The findings suggest that SAM priors can effectively supplement INF-based RS segmentation, enabling better cross-view consistency with limited labeled data and enabling scalable analysis of large RS scenes.

Abstract

Multi-view segmentation in Remote Sensing (RS) seeks to segment images from diverse perspectives within a scene. Recent methods leverage 3D information extracted from an Implicit Neural Field (INF), bolstering result consistency across multiple views while using limited accounts of labels (even within 3-5 labels) to streamline labor. Nonetheless, achieving superior performance within the constraints of limited-view labels remains challenging due to inadequate scene-wide supervision and insufficient semantic features within the INF. To address these. we propose to inject the prior of the visual foundation model-Segment Anything(SAM), to the INF to obtain better results under the limited number of training data. Specifically, we contrast SAM features between testing and training views to derive pseudo labels for each testing view, augmenting scene-wide labeling information. Subsequently, we introduce SAM features via a transformer into the INF of the scene, supplementing the semantic information. The experimental results demonstrate that our method outperforms the mainstream method, confirming the efficacy of SAM as a supplement to the INF for this task.

Multi-view Remote Sensing Image Segmentation With SAM priors

TL;DR

Abstract

Multi-view Remote Sensing Image Segmentation With SAM priors

Authors

TL;DR

Abstract

Table of Contents

Figures (2)