Table of Contents
Fetching ...

UP-Diff: Latent Diffusion Model for Remote Sensing Urban Prediction

Zeyu Wang, Zecheng Hao, Jingyu Lin, Yuchao Feng, Yufei Guo

TL;DR

The paper introduces Remote Sensing Urban Prediction (UP), a task that forecasts future urban layouts from existing layouts and planned changes without requiring paired prechange/post-change images. It presents UP-Diff, a latent diffusion model that encodes pre-change layouts and change maps into a latent space and employs a gated, layout-aware cross-attention mechanism within an autoencoder-based diffusion framework to generate high-fidelity post-change RS images. The approach leverages pre-trained Stable Diffusion weights, a ConvNeXt encoder for layout conditioning, and a CLIP-based text condition, achieving superior LPIPS and FID scores on LEVIR-CD and SYSU-CD compared with strong baselines, while enabling dynamic modification of planned changes. Experimental results demonstrate UP-Diff’s potential for practical urban planning, offering accurate, flexible generation of future layouts and paving the way for diverse planning scenarios. The work contributes the first RS UP formulation, a novel layout-conditioned diffusion model, and a demonstration that diffusion-based RS UP can outperform conventional CD methods in both fidelity and planning utility.

Abstract

This study introduces a novel Remote Sensing (RS) Urban Prediction (UP) task focused on future urban planning, which aims to forecast urban layouts by utilizing information from existing urban layouts and planned change maps. To address the proposed RS UP task, we propose UP-Diff, which leverages a Latent Diffusion Model (LDM) to capture positionaware embeddings of pre-change urban layouts and planned change maps. In specific, the trainable cross-attention layers within UP-Diff's iterative diffusion modules enable the model to dynamically highlight crucial regions for targeted modifications. By utilizing our UP-Diff, designers can effectively refine and adjust future urban city plans by making modifications to the change maps in a dynamic and adaptive manner. Compared with conventional RS Change Detection (CD) methods, the proposed UP-Diff for the RS UP task avoids the requirement of paired prechange and post-change images, which enhances the practical usage in city development. Experimental results on LEVIRCD and SYSU-CD datasets show UP-Diff's ability to accurately predict future urban layouts with high fidelity, demonstrating its potential for urban planning. Code and model weights are available at https://github.com/zeyuwang-zju/UP-Diff.

UP-Diff: Latent Diffusion Model for Remote Sensing Urban Prediction

TL;DR

The paper introduces Remote Sensing Urban Prediction (UP), a task that forecasts future urban layouts from existing layouts and planned changes without requiring paired prechange/post-change images. It presents UP-Diff, a latent diffusion model that encodes pre-change layouts and change maps into a latent space and employs a gated, layout-aware cross-attention mechanism within an autoencoder-based diffusion framework to generate high-fidelity post-change RS images. The approach leverages pre-trained Stable Diffusion weights, a ConvNeXt encoder for layout conditioning, and a CLIP-based text condition, achieving superior LPIPS and FID scores on LEVIR-CD and SYSU-CD compared with strong baselines, while enabling dynamic modification of planned changes. Experimental results demonstrate UP-Diff’s potential for practical urban planning, offering accurate, flexible generation of future layouts and paving the way for diverse planning scenarios. The work contributes the first RS UP formulation, a novel layout-conditioned diffusion model, and a demonstration that diffusion-based RS UP can outperform conventional CD methods in both fidelity and planning utility.

Abstract

This study introduces a novel Remote Sensing (RS) Urban Prediction (UP) task focused on future urban planning, which aims to forecast urban layouts by utilizing information from existing urban layouts and planned change maps. To address the proposed RS UP task, we propose UP-Diff, which leverages a Latent Diffusion Model (LDM) to capture positionaware embeddings of pre-change urban layouts and planned change maps. In specific, the trainable cross-attention layers within UP-Diff's iterative diffusion modules enable the model to dynamically highlight crucial regions for targeted modifications. By utilizing our UP-Diff, designers can effectively refine and adjust future urban city plans by making modifications to the change maps in a dynamic and adaptive manner. Compared with conventional RS Change Detection (CD) methods, the proposed UP-Diff for the RS UP task avoids the requirement of paired prechange and post-change images, which enhances the practical usage in city development. Experimental results on LEVIRCD and SYSU-CD datasets show UP-Diff's ability to accurately predict future urban layouts with high fidelity, demonstrating its potential for urban planning. Code and model weights are available at https://github.com/zeyuwang-zju/UP-Diff.
Paper Structure (16 sections, 8 equations, 4 figures, 2 tables)

This paper contains 16 sections, 8 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Illustration of the difference between (a) conventional RS Change Detection (CD) and (b) our proposed RS Urban Prediction (UP).
  • Figure 2: Illustration of our proposed UP-Diff for Remote Sensing Urban Prediction. (a) Training of the autoencoder for reconstruction. (b) Training of UP-UNet for latent diffusion and denoising. C denotes the concatenation.
  • Figure 3: Qualitative results of the baseline methods and our UP-Diff on LEVIR-CD and SYSU-CD datasets for the proposed RS UP task.
  • Figure 4: Qualitative comparison on the generated images for RS CD task.