Table of Contents
Fetching ...

GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis

Srikumar Sastry, Subash Khanal, Aayush Dhakal, Nathan Jacobs

TL;DR

GeoSynth addresses the need for contextually-aware, high-resolution satellite image synthesis by enabling explicit layout control via OpenStreetMap inputs and style control via text prompts and geographic location. It combines latent diffusion with ControlNet and CoordNet, leveraging SatCLIP location embeddings to condition on geography and enabling multiple layout controls (OSM, Canny, SAM). Evaluations on a 44,848-pair dataset show that geographic conditioning improves realism (FID, SSIM) and that text guidance enhances diversity and quality, with strong zero-shot generalization. The work advances remote sensing data generation for urban planning, data augmentation, and digital-twin applications, and provides code and checkpoints for the community.

Abstract

We present GeoSynth, a model for synthesizing satellite images with global style and image-driven layout control. The global style control is via textual prompts or geographic location. These enable the specification of scene semantics or regional appearance respectively, and can be used together. We train our model on a large dataset of paired satellite imagery, with automatically generated captions, and OpenStreetMap data. We evaluate various combinations of control inputs, including different types of layout controls. Results demonstrate that our model can generate diverse, high-quality images and exhibits excellent zero-shot generalization. The code and model checkpoints are available at https://github.com/mvrl/GeoSynth.

GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis

TL;DR

GeoSynth addresses the need for contextually-aware, high-resolution satellite image synthesis by enabling explicit layout control via OpenStreetMap inputs and style control via text prompts and geographic location. It combines latent diffusion with ControlNet and CoordNet, leveraging SatCLIP location embeddings to condition on geography and enabling multiple layout controls (OSM, Canny, SAM). Evaluations on a 44,848-pair dataset show that geographic conditioning improves realism (FID, SSIM) and that text guidance enhances diversity and quality, with strong zero-shot generalization. The work advances remote sensing data generation for urban planning, data augmentation, and digital-twin applications, and provides code and checkpoints for the community.

Abstract

We present GeoSynth, a model for synthesizing satellite images with global style and image-driven layout control. The global style control is via textual prompts or geographic location. These enable the specification of scene semantics or regional appearance respectively, and can be used together. We train our model on a large dataset of paired satellite imagery, with automatically generated captions, and OpenStreetMap data. We evaluate various combinations of control inputs, including different types of layout controls. Results demonstrate that our model can generate diverse, high-quality images and exhibits excellent zero-shot generalization. The code and model checkpoints are available at https://github.com/mvrl/GeoSynth.
Paper Structure (16 sections, 1 equation, 12 figures, 4 tables)

This paper contains 16 sections, 1 equation, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Satellite images synthesized by GeoSynth using OpenStreetMap for layout control and textual prompts for style control.
  • Figure 2: Each dataset sample consists of a satellite image, an OSM image, and an automatically generated textual description. Additionally, the dataset includes SAM masks for each satellite image.
  • Figure 3: A high-level architecture overview of GeoSynth, which consists of a pre-trained LDM, ControlNet and CoordNet.
  • Figure 4: Geo-aware generation. We show four example generations of satellite images using six different geographic locations. We use the same OSM control and random seed without specifying any textual prompt.
  • Figure 5: Synthesis performance of GeoSynth when using various layout controls and text prompts.
  • ...and 7 more figures