Learnable Prompt for Few-Shot Semantic Segmentation in Remote Sensing Domain
Steve Andreas Immanuel, Hagai Raja Sinulingga
TL;DR
This work tackles generalized few-shot semantic segmentation in remote sensing by leveraging SegGPT as a foundation model and introducing per-novel-class learnable prompts trained on scarce support data. The method integrates patch-based predictions and a patch-and-stitch inpainting-inspired fusion to handle multi-scale objects, while image-similarity search guides prompt selection and a filtering step reduces false positives. Empirical results on the OpenEarthMap GFSS dataset show notable gains, boosting validation weighted mIoU from 15.96 to 35.08 and achieving 36.52 on the test set, with ablations confirming the contribution of prompts, patch-based fusion, and filtering. The approach offers a simple, extensible, and computationally viable path for incorporating new classes in remote-sensing segmentation without extensive re-training of base models.
Abstract
Few-shot segmentation is a task to segment objects or regions of novel classes within an image given only a few annotated examples. In the generalized setting, the task extends to segment both the base and the novel classes. The main challenge is how to train the model such that the addition of novel classes does not hurt the base classes performance, also known as catastrophic forgetting. To mitigate this issue, we use SegGPT as our base model and train it on the base classes. Then, we use separate learnable prompts to handle predictions for each novel class. To handle various object sizes which typically present in remote sensing domain, we perform patch-based prediction. To address the discontinuities along patch boundaries, we propose a patch-and-stitch technique by re-framing the problem as an image inpainting task. During inference, we also utilize image similarity search over image embeddings for prompt selection and novel class filtering to reduce false positive predictions. Based on our experiments, our proposed method boosts the weighted mIoU of a simple fine-tuned SegGPT from 15.96 to 35.08 on the validation set of few-shot OpenEarthMap dataset given in the challenge.
