Change Detection Between Optical Remote Sensing Imagery and Map Data via Segment Anything Model (SAM)
Hongruixuan Chen, Jian Song, Naoto Yokoya
TL;DR
This work tackles unsupervised multimodal change detection between optical high-resolution imagery and OpenStreetMap data by using the Segmentation Anything Model (SAM) to project both modalities into a shared segmentation domain. It introduces two strategies—no-prompt segmentation and instance-map prompting—to detect general land-cover changes and emergent objects, respectively. The approach leverages SAM's zero-shot capabilities, hierarchical mask aggregation guided by OSM, and instance prompts to bridge modality gaps without labeled data. Experimental results on Aachen, Christchurch, and Vegas show competitive performance against representative unsupervised baselines, highlighting the practical potential of segmentation-domain cross-modality change detection with vision foundation models. Overall, the framework advances unsupervised map–image change detection and points toward extensions into unsupervised semantic change analysis.
Abstract
Unsupervised multimodal change detection is pivotal for time-sensitive tasks and comprehensive multi-temporal Earth monitoring. In this study, we explore unsupervised multimodal change detection between two key remote sensing data sources: optical high-resolution imagery and OpenStreetMap (OSM) data. Specifically, we propose to utilize the vision foundation model Segmentation Anything Model (SAM), for addressing our task. Leveraging SAM's exceptional zero-shot transfer capability, high-quality segmentation maps of optical images can be obtained. Thus, we can directly compare these two heterogeneous data forms in the so-called segmentation domain. We then introduce two strategies for guiding SAM's segmentation process: the 'no-prompt' and 'box/mask prompt' methods. The two strategies are designed to detect land-cover changes in general scenarios and to identify new land-cover objects within existing backgrounds, respectively. Experimental results on three datasets indicate that the proposed approach can achieve more competitive results compared to representative unsupervised multimodal change detection methods.
