Table of Contents
Fetching ...

Exploring the Capability of Text-to-Image Diffusion Models with Structural Edge Guidance for Multi-Spectral Satellite Image Inpainting

Mikolaj Czerkawski, Christos Tachtatzis

TL;DR

This paper evaluates the utility of text-to-image diffusion models for multispectral satellite image inpainting by proposing a two-stage pipeline: RGB inpainting via StableDiffusion with ControlNet edge guidance, followed by a Deep Image Prior-based RGB-to-MSI translation to recover the remaining bands. Empirical results show that diffusion-based inpainting, even with structural guidance, often yields artifacts, and a simple internal inpainting using historical data delivers higher-quality MSI reconstructions. The work also demonstrates a flexible RGB-to-MSI transfer approach that is zero-shot and dependent on RGB accuracy, providing a practical path to extend RGB priors to 13-band Sentinel-2 data. Overall, the findings suggest limitations of current general-purpose diffusion models for satellite MSI restoration but offer a viable pipeline for leveraging RGB priors and historical structure for improved inpainting. The study thus informs the design of more robust MSI restoration techniques and potential data augmentation strategies using text-conditioned generative models.

Abstract

The letter investigates the utility of text-to-image inpainting models for satellite image data. Two technical challenges of injecting structural guiding signals into the generative process as well as translating the inpainted RGB pixels to a wider set of MSI bands are addressed by introducing a novel inpainting framework based on StableDiffusion and ControlNet as well as a novel method for RGB-to-MSI translation. The results on a wider set of data suggest that the inpainting synthesized via StableDiffusion suffers from undesired artifacts and that a simple alternative of self-supervised internal inpainting achieves a higher quality of synthesis.

Exploring the Capability of Text-to-Image Diffusion Models with Structural Edge Guidance for Multi-Spectral Satellite Image Inpainting

TL;DR

This paper evaluates the utility of text-to-image diffusion models for multispectral satellite image inpainting by proposing a two-stage pipeline: RGB inpainting via StableDiffusion with ControlNet edge guidance, followed by a Deep Image Prior-based RGB-to-MSI translation to recover the remaining bands. Empirical results show that diffusion-based inpainting, even with structural guidance, often yields artifacts, and a simple internal inpainting using historical data delivers higher-quality MSI reconstructions. The work also demonstrates a flexible RGB-to-MSI transfer approach that is zero-shot and dependent on RGB accuracy, providing a practical path to extend RGB priors to 13-band Sentinel-2 data. Overall, the findings suggest limitations of current general-purpose diffusion models for satellite MSI restoration but offer a viable pipeline for leveraging RGB priors and historical structure for improved inpainting. The study thus informs the design of more robust MSI restoration techniques and potential data augmentation strategies using text-conditioned generative models.

Abstract

The letter investigates the utility of text-to-image inpainting models for satellite image data. Two technical challenges of injecting structural guiding signals into the generative process as well as translating the inpainted RGB pixels to a wider set of MSI bands are addressed by introducing a novel inpainting framework based on StableDiffusion and ControlNet as well as a novel method for RGB-to-MSI translation. The results on a wider set of data suggest that the inpainting synthesized via StableDiffusion suffers from undesired artifacts and that a simple alternative of self-supervised internal inpainting achieves a higher quality of synthesis.
Paper Structure (10 sections, 4 figures, 3 tables)

This paper contains 10 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Complete pipeline for multi-spectral satellite image inpainting. The process is built on a sequence of two steps, where a pre-trained diffusion model is first applied to RGB data for inpainting (text-to-image RGB inapinting), and then a Deep Image Prior Ulyanov2020 approach is used to transfer that the reconstruction beyond the non-RGB channels (Deep Image Prior RGB-to-MSI Completion). This allows for any RGB-based pre-trained model, such as StableDiffusion Rombach2022 to be incorporated into a pipeline capable of mutli-spectral satellite image inpainting.
  • Figure 2: The Edge-Guided Inpainting diffusion pipeline used for this work employs a ControlNet approach Zhang2023, with an inpainting StableDiffusion backbone.
  • Figure 3: Comparison of the two methods of filling the masked region in the input to the diffusion models. Furthermore, output achieved with the StableDiffusion Inpainting scheme is shown for reference as a result of using each method.
  • Figure 4: RGB visualization of 4 random samples drawn from the test dataset and the corresponding output from each method. It is shown that the Direct-DIP struggles to perform good quality inpainting with no extra source of information, producing visually incoherent output. The text-based models appear to produce visually coherent, yet inaccurate inpaintings, despite the efforts to inject correct structural information into the process.