Data Augmentation in Earth Observation: A Diffusion Model Approach
Tiago Sousa, Benoît Ries, Nicolas Guelfi
TL;DR
The paper tackles data scarcity and limited semantic diversity in Earth Observation imagery for AI tasks. It introduces a four-stage diffusion-model–based data augmentation pipeline that leverages meta-prompts for instruction generation, vision-language captioning, domain-adapted diffusion fine-tuning via LoRA, and prompt-guided image generation to enrich EO datasets. On the EuroSAT dataset, the approach yields substantial gains, including zero-shot accuracy improvements of CLIP-RN50 to $58.07\%$ (+$16.97\%$) and CLIP-ViT-B/32 to $69.23\%$ (+$19.83\%$), and higher top-1/top-3 accuracy over baselines and AutoAugment. The results demonstrate the potential of diffusion-based EO augmentation to boost robustness in low-data regimes, while also highlighting limitations in domain-specific captioning and suggesting future directions toward hybrid augmentation and advanced captioning models.
Abstract
High-quality Earth Observation (EO) imagery is essential for accurate analysis and informed decision making across sectors. However, data scarcity caused by atmospheric conditions, seasonal variations, and limited geographical coverage hinders the effective application of Artificial Intelligence (AI) in EO. Traditional data augmentation techniques, which rely on basic parameterized image transformations, often fail to introduce sufficient diversity across key semantic axes. These axes include natural changes such as snow and floods, human impacts like urbanization and roads, and disasters such as wildfires and storms, which limits the accuracy of AI models in EO applications. To address this, we propose a four-stage data augmentation approach that integrates diffusion models to enhance semantic diversity. Our method employs meta-prompts for instruction generation, vision-language models for rich captioning, EO-specific diffusion model fine-tuning, and iterative data augmentation. Extensive experiments using four augmentation techniques demonstrate that our approach consistently outperforms established methods, generating semantically diverse EO images and improving AI model performance.
