Table of Contents
Fetching ...

Tackling Few-Shot Segmentation in Remote Sensing via Inpainting Diffusion Model

Steve Andreas Immanuel, Woojin Cho, Junhyuk Heo, Darongsae Kwon

TL;DR

The paper tackles data scarcity in remote-sensing segmentation by introducing an image-conditioned inpainting diffusion pipeline that synthesizes diverse novel-class instances conditioned on limited examples. It filters generated content for semantic fidelity with CLIP-style cosine similarity and refines masks with SAM, producing high-quality annotations for training. By fine-tuning a diffusion model on remote-sensing data and using generated samples to train off-the-shelf segmentation models, the method achieves substantial performance gains in low-data regimes across multiple architectures, sometimes rivaling challenge-winning solutions. The approach is simple, versatile, and potentially transferable to other domains where annotated data are scarce.

Abstract

Limited data is a common problem in remote sensing due to the high cost of obtaining annotated samples. In the few-shot segmentation task, models are typically trained on base classes with abundant annotations and later adapted to novel classes with limited examples. However, this often necessitates specialized model architectures or complex training strategies. Instead, we propose a simple approach that leverages diffusion models to generate diverse variations of novel-class objects within a given scene, conditioned by the limited examples of the novel classes. By framing the problem as an image inpainting task, we synthesize plausible instances of novel classes under various environments, effectively increasing the number of samples for the novel classes and mitigating overfitting. The generated samples are then assessed using a cosine similarity metric to ensure semantic consistency with the novel classes. Additionally, we employ Segment Anything Model (SAM) to segment the generated samples and obtain precise annotations. By using high-quality synthetic data, we can directly fine-tune off-the-shelf segmentation models. Experimental results demonstrate that our method significantly enhances segmentation performance in low-data regimes, highlighting its potential for real-world remote sensing applications.

Tackling Few-Shot Segmentation in Remote Sensing via Inpainting Diffusion Model

TL;DR

The paper tackles data scarcity in remote-sensing segmentation by introducing an image-conditioned inpainting diffusion pipeline that synthesizes diverse novel-class instances conditioned on limited examples. It filters generated content for semantic fidelity with CLIP-style cosine similarity and refines masks with SAM, producing high-quality annotations for training. By fine-tuning a diffusion model on remote-sensing data and using generated samples to train off-the-shelf segmentation models, the method achieves substantial performance gains in low-data regimes across multiple architectures, sometimes rivaling challenge-winning solutions. The approach is simple, versatile, and potentially transferable to other domains where annotated data are scarce.

Abstract

Limited data is a common problem in remote sensing due to the high cost of obtaining annotated samples. In the few-shot segmentation task, models are typically trained on base classes with abundant annotations and later adapted to novel classes with limited examples. However, this often necessitates specialized model architectures or complex training strategies. Instead, we propose a simple approach that leverages diffusion models to generate diverse variations of novel-class objects within a given scene, conditioned by the limited examples of the novel classes. By framing the problem as an image inpainting task, we synthesize plausible instances of novel classes under various environments, effectively increasing the number of samples for the novel classes and mitigating overfitting. The generated samples are then assessed using a cosine similarity metric to ensure semantic consistency with the novel classes. Additionally, we employ Segment Anything Model (SAM) to segment the generated samples and obtain precise annotations. By using high-quality synthetic data, we can directly fine-tune off-the-shelf segmentation models. Experimental results demonstrate that our method significantly enhances segmentation performance in low-data regimes, highlighting its potential for real-world remote sensing applications.

Paper Structure

This paper contains 19 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overall pipeline of the proposed approach. An inpainting diffusion model generates novel-class samples, and SAM refines the segmentation masks. The results are used for training samples to improve model performance on few-shot settings.
  • Figure 2: Failure case where the generated object is excessively large due to the large mask area.
  • Figure 3: Comparison of inpainting methods for remote sensing: masked image (left), copy-paste (middle), and our method (right), showcasing realistic painted images for boats, agricultural land, bridges, and sportsfields.