Table of Contents
Fetching ...

Data Augmentation in Earth Observation: A Diffusion Model Approach

Tiago Sousa, Benoît Ries, Nicolas Guelfi

TL;DR

The paper tackles data scarcity and limited semantic diversity in Earth Observation imagery for AI tasks. It introduces a four-stage diffusion-model–based data augmentation pipeline that leverages meta-prompts for instruction generation, vision-language captioning, domain-adapted diffusion fine-tuning via LoRA, and prompt-guided image generation to enrich EO datasets. On the EuroSAT dataset, the approach yields substantial gains, including zero-shot accuracy improvements of CLIP-RN50 to $58.07\%$ (+$16.97\%$) and CLIP-ViT-B/32 to $69.23\%$ (+$19.83\%$), and higher top-1/top-3 accuracy over baselines and AutoAugment. The results demonstrate the potential of diffusion-based EO augmentation to boost robustness in low-data regimes, while also highlighting limitations in domain-specific captioning and suggesting future directions toward hybrid augmentation and advanced captioning models.

Abstract

High-quality Earth Observation (EO) imagery is essential for accurate analysis and informed decision making across sectors. However, data scarcity caused by atmospheric conditions, seasonal variations, and limited geographical coverage hinders the effective application of Artificial Intelligence (AI) in EO. Traditional data augmentation techniques, which rely on basic parameterized image transformations, often fail to introduce sufficient diversity across key semantic axes. These axes include natural changes such as snow and floods, human impacts like urbanization and roads, and disasters such as wildfires and storms, which limits the accuracy of AI models in EO applications. To address this, we propose a four-stage data augmentation approach that integrates diffusion models to enhance semantic diversity. Our method employs meta-prompts for instruction generation, vision-language models for rich captioning, EO-specific diffusion model fine-tuning, and iterative data augmentation. Extensive experiments using four augmentation techniques demonstrate that our approach consistently outperforms established methods, generating semantically diverse EO images and improving AI model performance.

Data Augmentation in Earth Observation: A Diffusion Model Approach

TL;DR

The paper tackles data scarcity and limited semantic diversity in Earth Observation imagery for AI tasks. It introduces a four-stage diffusion-model–based data augmentation pipeline that leverages meta-prompts for instruction generation, vision-language captioning, domain-adapted diffusion fine-tuning via LoRA, and prompt-guided image generation to enrich EO datasets. On the EuroSAT dataset, the approach yields substantial gains, including zero-shot accuracy improvements of CLIP-RN50 to (+) and CLIP-ViT-B/32 to (+), and higher top-1/top-3 accuracy over baselines and AutoAugment. The results demonstrate the potential of diffusion-based EO augmentation to boost robustness in low-data regimes, while also highlighting limitations in domain-specific captioning and suggesting future directions toward hybrid augmentation and advanced captioning models.

Abstract

High-quality Earth Observation (EO) imagery is essential for accurate analysis and informed decision making across sectors. However, data scarcity caused by atmospheric conditions, seasonal variations, and limited geographical coverage hinders the effective application of Artificial Intelligence (AI) in EO. Traditional data augmentation techniques, which rely on basic parameterized image transformations, often fail to introduce sufficient diversity across key semantic axes. These axes include natural changes such as snow and floods, human impacts like urbanization and roads, and disasters such as wildfires and storms, which limits the accuracy of AI models in EO applications. To address this, we propose a four-stage data augmentation approach that integrates diffusion models to enhance semantic diversity. Our method employs meta-prompts for instruction generation, vision-language models for rich captioning, EO-specific diffusion model fine-tuning, and iterative data augmentation. Extensive experiments using four augmentation techniques demonstrate that our approach consistently outperforms established methods, generating semantically diverse EO images and improving AI model performance.
Paper Structure (25 sections, 3 equations, 5 figures, 1 table)

This paper contains 25 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: An overview of our four-stage data augmentation process for the generation of synthetic images that accurately depict the intricate interactions and behaviors observed in natural environments.
  • Figure 2: An example caption generated from the second stage of our data augmentation process.
  • Figure 3: Top-1 Accuracy achieved using the different data augmentation strategies on both CLIP variants.
  • Figure 4: Top-3 Accuracy achieved using the different data augmentation strategies on both CLIP variants.
  • Figure 5: Examples of images generated using our method (top row) compared with images from the corresponding categories in the EuroSAT dataset (bottom row).