Table of Contents
Fetching ...

OSMGen: Highly Controllable Satellite Image Synthesis using OpenStreetMap Data

Amir Ziashahabi, Narges Ghasemi, Sajjad Shahabi, John Krumm, Salman Avestimehr, Cyrus Shahabi

TL;DR

OSMGen addresses the scarcity of labeled geospatial data by synthesizing realistic satellite imagery conditioned on rich OpenStreetMap JSON. It advances controllable generation using multi-modal conditioning—vector geometries, semantic tags, location, time—and text guidance within a ControlNet-enabled diffusion pipeline that also employs SatCLIP and Date2Vec. A key contribution is producing perfectly co-registered before/after image pairs via DDIM inversion, enabling data augmentation, change detection training, and interactive planning previews from map edits. The approach supports a closed-loop vision-to-map pipeline that could drive automated, structured OSM updates from satellite imagery, reducing manual map maintenance.

Abstract

Accurate and up-to-date geospatial data are essential for urban planning, infrastructure monitoring, and environmental management. Yet, automating urban monitoring remains difficult because curated datasets of specific urban features and their changes are scarce. We introduce OSMGen, a generative framework that creates realistic satellite imagery directly from raw OpenStreetMap (OSM) data. Unlike prior work that relies on raster tiles, OSMGen uses the full richness of OSM JSON, including vector geometries, semantic tags, location, and time, giving fine-grained control over how scenes are generated. A central feature of the framework is the ability to produce consistent before-after image pairs: user edits to OSM inputs translate into targeted visual changes, while the rest of the scene is preserved. This makes it possible to generate training data that addresses scarcity and class imbalance, and to give planners a simple way to preview proposed interventions by editing map data. More broadly, OSMGen produces paired (JSON, image) data for both static and changed states, paving the way toward a closed-loop system where satellite imagery can automatically drive structured OSM updates. Source code is available at https://github.com/amir-zsh/OSMGen.

OSMGen: Highly Controllable Satellite Image Synthesis using OpenStreetMap Data

TL;DR

OSMGen addresses the scarcity of labeled geospatial data by synthesizing realistic satellite imagery conditioned on rich OpenStreetMap JSON. It advances controllable generation using multi-modal conditioning—vector geometries, semantic tags, location, time—and text guidance within a ControlNet-enabled diffusion pipeline that also employs SatCLIP and Date2Vec. A key contribution is producing perfectly co-registered before/after image pairs via DDIM inversion, enabling data augmentation, change detection training, and interactive planning previews from map edits. The approach supports a closed-loop vision-to-map pipeline that could drive automated, structured OSM updates from satellite imagery, reducing manual map maintenance.

Abstract

Accurate and up-to-date geospatial data are essential for urban planning, infrastructure monitoring, and environmental management. Yet, automating urban monitoring remains difficult because curated datasets of specific urban features and their changes are scarce. We introduce OSMGen, a generative framework that creates realistic satellite imagery directly from raw OpenStreetMap (OSM) data. Unlike prior work that relies on raster tiles, OSMGen uses the full richness of OSM JSON, including vector geometries, semantic tags, location, and time, giving fine-grained control over how scenes are generated. A central feature of the framework is the ability to produce consistent before-after image pairs: user edits to OSM inputs translate into targeted visual changes, while the rest of the scene is preserved. This makes it possible to generate training data that addresses scarcity and class imbalance, and to give planners a simple way to preview proposed interventions by editing map data. More broadly, OSMGen produces paired (JSON, image) data for both static and changed states, paving the way toward a closed-loop system where satellite imagery can automatically drive structured OSM updates. Source code is available at https://github.com/amir-zsh/OSMGen.

Paper Structure

This paper contains 29 sections, 10 equations, 5 figures.

Figures (5)

  • Figure 1: Overview of our ControlNet pipeline. Semantic masks are fed into ControlNet to generate control feature maps that are added into the U-Net; spatial and temporal embeddings are summed into the timestep embedding; the text prompt is injected via cross-attention.
  • Figure 2: Qualitative evaluation on held‐out FMoW locations. This layout highlights both the model’s ability to reproduce large‐scale structure and to capture fine‐grained POI details in context.
  • Figure 3: Editing via DDIM inversion. Edits are applied locally while preserving consistency outside the modified region.
  • Figure 4: Change synthesis via DDIM inversion.
  • Figure 5: Seasonal conditioning: for fixed masks, varying the date input produces distinct winter and summer images.