Terrain Diffusion Network: Climatic-Aware Terrain Generation with Geological Sketch Guidance
Zexin Hu, Kun Hu, Clinton Mo, Lei Pan, Zhiyong Wang
TL;DR
TDN tackles the challenge of controllable, climate-aware terrain generation by integrating a diffusion-based framework with a multi-level denoising scheme and dedicated terrain-sketch autoencoders. It uses three specialized synthesizers operating at structural, intermediate, and fine-grained levels to separately model broad geomorphology and climatic patterns, all guided by user sketches encoded into aligned latent spaces. Evaluated on a NASA Topology Images-derived dataset, TDN achieves state-of-the-art performance (e.g., superior FID and MSE) and demonstrates robust adherence to sketches, including in out-of-domain scenarios. The approach enables fine-grained, climate-sensitive terrain synthesis suitable for games, animation, and virtual environments, with public code and data planned.
Abstract
Sketch-based terrain generation seeks to create realistic landscapes for virtual environments in various applications such as computer games, animation and virtual reality. Recently, deep learning based terrain generation has emerged, notably the ones based on generative adversarial networks (GAN). However, these methods often struggle to fulfill the requirements of flexible user control and maintain generative diversity for realistic terrain. Therefore, we propose a novel diffusion-based method, namely terrain diffusion network (TDN), which actively incorporates user guidance for enhanced controllability, taking into account terrain features like rivers, ridges, basins, and peaks. Instead of adhering to a conventional monolithic denoising process, which often compromises the fidelity of terrain details or the alignment with user control, a multi-level denoising scheme is proposed to generate more realistic terrains by taking into account fine-grained details, particularly those related to climatic patterns influenced by erosion and tectonic activities. Specifically, three terrain synthesisers are designed for structural, intermediate, and fine-grained level denoising purposes, which allow each synthesiser concentrate on a distinct terrain aspect. Moreover, to maximise the efficiency of our TDN, we further introduce terrain and sketch latent spaces for the synthesizers with pre-trained terrain autoencoders. Comprehensive experiments on a new dataset constructed from NASA Topology Images clearly demonstrate the effectiveness of our proposed method, achieving the state-of-the-art performance. Our code and dataset will be publicly available.
