Generating Synthetic Satellite Imagery With Deep-Learning Text-to-Image Models -- Technical Challenges and Implications for Monitoring and Verification
Tuong Vy Nguyen, Alexander Glaser, Felix Biessmann
TL;DR
This work addresses the potential and risks of generating synthetic satellite imagery with deep-learning text-to-image models for monitoring and verification. It uses Stable Diffusion with three conditioning-based fine-tuning approaches—DreamBooth, Textual Inversion, and Text-to-Image—applied to nuclear-facility and UC Merced datasets to study controllability via semantic variations (season, location, time of day) and to evaluate realism with domain-adapted metrics. The findings indicate Text2Img generally yields the strongest performance on remote-sensing data, with DreamBooth excelling at fidelity for specific targets; however, automated metrics can misalign with human perception, highlighting the need for human studies and domain-specific evaluation. The paper emphasizes ethical and societal implications, including potential misuse and the necessity for detection, watermarking, and cross-disciplinary metric development to safeguard monitoring and verification tasks in the remote-sensing field.
Abstract
Novel deep-learning (DL) architectures have reached a level where they can generate digital media, including photorealistic images, that are difficult to distinguish from real data. These technologies have already been used to generate training data for Machine Learning (ML) models, and large text-to-image models like DALL-E 2, Imagen, and Stable Diffusion are achieving remarkable results in realistic high-resolution image generation. Given these developments, issues of data authentication in monitoring and verification deserve a careful and systematic analysis: How realistic are synthetic images? How easily can they be generated? How useful are they for ML researchers, and what is their potential for Open Science? In this work, we use novel DL models to explore how synthetic satellite images can be created using conditioning mechanisms. We investigate the challenges of synthetic satellite image generation and evaluate the results based on authenticity and state-of-the-art metrics. Furthermore, we investigate how synthetic data can alleviate the lack of data in the context of ML methods for remote-sensing. Finally we discuss implications of synthetic satellite imagery in the context of monitoring and verification.
