Table of Contents
Fetching ...

Segment Using Just One Example

Pratik Vora, Sudipan Saha

TL;DR

This work tackles one-shot semantic segmentation in Earth observation by using a single example image with a known mask to segment the same target in a query image without any training. It leverages Segment Anything (SAM) with four image-based, text-free prompt strategies applied to a stitched key–query image, and combines multiple SAM runs via ensemble and confidence-weighted aggregation, followed by morphological post-processing. The approach is evaluated on building and car segmentation from the ISPRS Potsdam dataset, demonstrating that building segmentation benefits from the method while car segmentation remains challenging due to smaller object size; the method outperforms a fine-tuned UNet baseline in this setting. This indicates the potential of foundation-model-based, data-efficient segmentation for rapid, deployment-ready Earth observation tasks, with future work aimed at improving small-object segmentation and handling lower-resolution imagery.

Abstract

Semantic segmentation is an important topic in computer vision with many relevant application in Earth observation. While supervised methods exist, the constraints of limited annotated data has encouraged development of unsupervised approaches. However, existing unsupervised methods resemble clustering and cannot be directly mapped to explicit target classes. In this paper, we deal with single shot semantic segmentation, where one example for the target class is provided, which is used to segment the target class from query/test images. Our approach exploits recently popular Segment Anything (SAM), a promptable foundation model. We specifically design several techniques to automatically generate prompts from the only example/key image in such a way that the segmentation is successfully achieved on a stitch or concatenation of the example/key and query/test images. Proposed technique does not involve any training phase and just requires one example image to grasp the concept. Furthermore, no text-based prompt is required for the proposed method. We evaluated the proposed techniques on building and car classes.

Segment Using Just One Example

TL;DR

This work tackles one-shot semantic segmentation in Earth observation by using a single example image with a known mask to segment the same target in a query image without any training. It leverages Segment Anything (SAM) with four image-based, text-free prompt strategies applied to a stitched key–query image, and combines multiple SAM runs via ensemble and confidence-weighted aggregation, followed by morphological post-processing. The approach is evaluated on building and car segmentation from the ISPRS Potsdam dataset, demonstrating that building segmentation benefits from the method while car segmentation remains challenging due to smaller object size; the method outperforms a fine-tuned UNet baseline in this setting. This indicates the potential of foundation-model-based, data-efficient segmentation for rapid, deployment-ready Earth observation tasks, with future work aimed at improving small-object segmentation and handling lower-resolution imagery.

Abstract

Semantic segmentation is an important topic in computer vision with many relevant application in Earth observation. While supervised methods exist, the constraints of limited annotated data has encouraged development of unsupervised approaches. However, existing unsupervised methods resemble clustering and cannot be directly mapped to explicit target classes. In this paper, we deal with single shot semantic segmentation, where one example for the target class is provided, which is used to segment the target class from query/test images. Our approach exploits recently popular Segment Anything (SAM), a promptable foundation model. We specifically design several techniques to automatically generate prompts from the only example/key image in such a way that the segmentation is successfully achieved on a stitch or concatenation of the example/key and query/test images. Proposed technique does not involve any training phase and just requires one example image to grasp the concept. Furthermore, no text-based prompt is required for the proposed method. We evaluated the proposed techniques on building and car classes.
Paper Structure (23 sections, 4 equations, 7 figures, 2 tables)

This paper contains 23 sections, 4 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Outline of the proposed method: given a key/example image for which the segmentation mask is known and a query image, proposed method concatenates them and feeds them to the SAM model as if they are single image. Furthermore, image-based prompts are fed to SAM that enables us to obtain the segmentation mask from the stitched image, i.e., also from the query image.
  • Figure 2: SAM pipeline (green: positive prompt, red: negative prompt). The image encoder obtains representation of the image whereas prompt encoder obtains the representation of the prompt inputs. Using this information, the decoder obtains the segmented image.
  • Figure 3: Proposed prompt techniques: (a) key only prompts, (b) key prompts and positive prompts from query/test , (c) negative prompts from key and positive prompts from query/test and (d) masked key prompts and positive prompts from query/test. Left image is the key image and right image is the query/test image. Positive prompts are shown in green and the negative prompts are shown in red.
  • Figure 4: Building detection on the stitched image (left: key image, right: query/test image) using four different proposed prompts shown in sub-figures (a), (b), (c) and (d)
  • Figure 5: Building segmentation masks for two different confidence scores: (a) High - 0.8, (b) Low - 0.4.
  • ...and 2 more figures