Table of Contents
Fetching ...

Geodiffussr: Generative Terrain Texturing with Elevation Fidelity

Tai Inui, Alexander Matsumura, Edgar Simo-Serra

TL;DR

This work tackles fast, controllable terrain generation that strictly adheres to a given Digital Elevation Map (DEM) while enabling text-driven texture synthesis. It introduces Geodiffussr, a flow-matching pipeline with multi-scale content aggregation (MCA) that injects DEM features from a pretrained VGG-16 into a UNet, conditioned on text via cross-attention, and augmented by upscaling for rendering. The key finding is that full MCA substantially improves perceptual texture quality and elevation-texture alignment (e.g., FID 10.29, LPIPS 0.066, ΔdCor 0.0016) compared to non-MCA baselines, establishing a strong baseline for 2.5D terrain ideation and previz. The work also provides a biome-diverse DEM–satellite dataset and discusses practical paths to production-scale resolutions, positioning the approach as complementary to physical terrain and ecosystem models.

Abstract

Large-scale terrain generation remains a labor-intensive task in computer graphics. We introduce Geodiffussr, a flow-matching pipeline that synthesizes text-guided texture maps while strictly adhering to a supplied Digital Elevation Map (DEM). The core mechanism is multi-scale content aggregation (MCA): DEM features from a pretrained encoder are injected into UNet blocks at multiple resolutions to enforce global-to-local elevation consistency. Compared with a non-MCA baseline, MCA markedly improves visual fidelity and strengthens height-appearance coupling (FID $\downarrow$ 49.16%, LPIPS $\downarrow$ 32.33%, $Δ$dCor $\downarrow$ to 0.0016). To train and evaluate Geodiffussr, we assemble a globally distributed, biome- and climate-stratified corpus of triplets pairing SRTM-derived DEMs with Sentinel-2 imagery and vision-grounded natural-language captions that describe visible land cover. We position Geodiffussr as a strong baseline and step toward controllable 2.5D landscape generation for coarse-scale ideation and previz, complementary to physically based terrain and ecosystem simulators.

Geodiffussr: Generative Terrain Texturing with Elevation Fidelity

TL;DR

This work tackles fast, controllable terrain generation that strictly adheres to a given Digital Elevation Map (DEM) while enabling text-driven texture synthesis. It introduces Geodiffussr, a flow-matching pipeline with multi-scale content aggregation (MCA) that injects DEM features from a pretrained VGG-16 into a UNet, conditioned on text via cross-attention, and augmented by upscaling for rendering. The key finding is that full MCA substantially improves perceptual texture quality and elevation-texture alignment (e.g., FID 10.29, LPIPS 0.066, ΔdCor 0.0016) compared to non-MCA baselines, establishing a strong baseline for 2.5D terrain ideation and previz. The work also provides a biome-diverse DEM–satellite dataset and discusses practical paths to production-scale resolutions, positioning the approach as complementary to physical terrain and ecosystem models.

Abstract

Large-scale terrain generation remains a labor-intensive task in computer graphics. We introduce Geodiffussr, a flow-matching pipeline that synthesizes text-guided texture maps while strictly adhering to a supplied Digital Elevation Map (DEM). The core mechanism is multi-scale content aggregation (MCA): DEM features from a pretrained encoder are injected into UNet blocks at multiple resolutions to enforce global-to-local elevation consistency. Compared with a non-MCA baseline, MCA markedly improves visual fidelity and strengthens height-appearance coupling (FID 49.16%, LPIPS 32.33%, dCor to 0.0016). To train and evaluate Geodiffussr, we assemble a globally distributed, biome- and climate-stratified corpus of triplets pairing SRTM-derived DEMs with Sentinel-2 imagery and vision-grounded natural-language captions that describe visible land cover. We position Geodiffussr as a strong baseline and step toward controllable 2.5D landscape generation for coarse-scale ideation and previz, complementary to physically based terrain and ecosystem simulators.

Paper Structure

This paper contains 14 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Examples of rendered 2.5D terrains using our proposed approach. We introduce Geodiffussr, a flow matching-based generative pipeline that can create terrain texture maps from intuitive text prompts, while realistically adhering to a specified Digital Elevation Map (DEM) by leveraging Multi-Scale Content aggregation (MCA). This provides a new baseline for text-conditioned, DEM-aware terrain synthesis and a stepping-stone toward fully controllable landscape generation.
  • Figure 2: Geodiffussr Pipeline. We condition a flow matching model on both text embeddings and Digital Elevation Maps (DEMs). Specifically for DEMs, we take multi-scale features from a pretrained VGG-16 model and inject into the UNet blocks. The source DEM and generated texture map are increased in resolution via subdivision and Real-ESRGAN superresolution wang2021realesrgan_arxiv respectively for rendering purposes. Combining these results in a 2.5D representation of a terrain as shown on the right.
  • Figure 3: Comparison of generated results between Full MCA (center) and non-MCA (right) versions of Geodiffussr. The textures are generated with various prompts featuring different biomes and a source DEM (left).
  • Figure 4: Sketch DEMs. Geodiffussr generalizes to user-drawn synthetic DEMs, producing coherent, prompt-consistent textures. This demonstrates the flexibility of our model to unseen complex geometry, and its potential to be applied with user-guided DEMs.