Exploring the Capability of Text-to-Image Diffusion Models with Structural Edge Guidance for Multi-Spectral Satellite Image Inpainting
Mikolaj Czerkawski, Christos Tachtatzis
TL;DR
This paper evaluates the utility of text-to-image diffusion models for multispectral satellite image inpainting by proposing a two-stage pipeline: RGB inpainting via StableDiffusion with ControlNet edge guidance, followed by a Deep Image Prior-based RGB-to-MSI translation to recover the remaining bands. Empirical results show that diffusion-based inpainting, even with structural guidance, often yields artifacts, and a simple internal inpainting using historical data delivers higher-quality MSI reconstructions. The work also demonstrates a flexible RGB-to-MSI transfer approach that is zero-shot and dependent on RGB accuracy, providing a practical path to extend RGB priors to 13-band Sentinel-2 data. Overall, the findings suggest limitations of current general-purpose diffusion models for satellite MSI restoration but offer a viable pipeline for leveraging RGB priors and historical structure for improved inpainting. The study thus informs the design of more robust MSI restoration techniques and potential data augmentation strategies using text-conditioned generative models.
Abstract
The letter investigates the utility of text-to-image inpainting models for satellite image data. Two technical challenges of injecting structural guiding signals into the generative process as well as translating the inpainted RGB pixels to a wider set of MSI bands are addressed by introducing a novel inpainting framework based on StableDiffusion and ControlNet as well as a novel method for RGB-to-MSI translation. The results on a wider set of data suggest that the inpainting synthesized via StableDiffusion suffers from undesired artifacts and that a simple alternative of self-supervised internal inpainting achieves a higher quality of synthesis.
