UP-Diff: Latent Diffusion Model for Remote Sensing Urban Prediction
Zeyu Wang, Zecheng Hao, Jingyu Lin, Yuchao Feng, Yufei Guo
TL;DR
The paper introduces Remote Sensing Urban Prediction (UP), a task that forecasts future urban layouts from existing layouts and planned changes without requiring paired prechange/post-change images. It presents UP-Diff, a latent diffusion model that encodes pre-change layouts and change maps into a latent space and employs a gated, layout-aware cross-attention mechanism within an autoencoder-based diffusion framework to generate high-fidelity post-change RS images. The approach leverages pre-trained Stable Diffusion weights, a ConvNeXt encoder for layout conditioning, and a CLIP-based text condition, achieving superior LPIPS and FID scores on LEVIR-CD and SYSU-CD compared with strong baselines, while enabling dynamic modification of planned changes. Experimental results demonstrate UP-Diff’s potential for practical urban planning, offering accurate, flexible generation of future layouts and paving the way for diverse planning scenarios. The work contributes the first RS UP formulation, a novel layout-conditioned diffusion model, and a demonstration that diffusion-based RS UP can outperform conventional CD methods in both fidelity and planning utility.
Abstract
This study introduces a novel Remote Sensing (RS) Urban Prediction (UP) task focused on future urban planning, which aims to forecast urban layouts by utilizing information from existing urban layouts and planned change maps. To address the proposed RS UP task, we propose UP-Diff, which leverages a Latent Diffusion Model (LDM) to capture positionaware embeddings of pre-change urban layouts and planned change maps. In specific, the trainable cross-attention layers within UP-Diff's iterative diffusion modules enable the model to dynamically highlight crucial regions for targeted modifications. By utilizing our UP-Diff, designers can effectively refine and adjust future urban city plans by making modifications to the change maps in a dynamic and adaptive manner. Compared with conventional RS Change Detection (CD) methods, the proposed UP-Diff for the RS UP task avoids the requirement of paired prechange and post-change images, which enhances the practical usage in city development. Experimental results on LEVIRCD and SYSU-CD datasets show UP-Diff's ability to accurately predict future urban layouts with high fidelity, demonstrating its potential for urban planning. Code and model weights are available at https://github.com/zeyuwang-zju/UP-Diff.
