Tackling Few-Shot Segmentation in Remote Sensing via Inpainting Diffusion Model
Steve Andreas Immanuel, Woojin Cho, Junhyuk Heo, Darongsae Kwon
TL;DR
The paper tackles data scarcity in remote-sensing segmentation by introducing an image-conditioned inpainting diffusion pipeline that synthesizes diverse novel-class instances conditioned on limited examples. It filters generated content for semantic fidelity with CLIP-style cosine similarity and refines masks with SAM, producing high-quality annotations for training. By fine-tuning a diffusion model on remote-sensing data and using generated samples to train off-the-shelf segmentation models, the method achieves substantial performance gains in low-data regimes across multiple architectures, sometimes rivaling challenge-winning solutions. The approach is simple, versatile, and potentially transferable to other domains where annotated data are scarce.
Abstract
Limited data is a common problem in remote sensing due to the high cost of obtaining annotated samples. In the few-shot segmentation task, models are typically trained on base classes with abundant annotations and later adapted to novel classes with limited examples. However, this often necessitates specialized model architectures or complex training strategies. Instead, we propose a simple approach that leverages diffusion models to generate diverse variations of novel-class objects within a given scene, conditioned by the limited examples of the novel classes. By framing the problem as an image inpainting task, we synthesize plausible instances of novel classes under various environments, effectively increasing the number of samples for the novel classes and mitigating overfitting. The generated samples are then assessed using a cosine similarity metric to ensure semantic consistency with the novel classes. Additionally, we employ Segment Anything Model (SAM) to segment the generated samples and obtain precise annotations. By using high-quality synthetic data, we can directly fine-tune off-the-shelf segmentation models. Experimental results demonstrate that our method significantly enhances segmentation performance in low-data regimes, highlighting its potential for real-world remote sensing applications.
