Table of Contents
Fetching ...

Urban Air Temperature Prediction using Conditional Diffusion Models

Siyang Dai, Jun Liu, Ngai-Man Cheung

TL;DR

This work tackles high-resolution urban air temperature prediction ($T_a$) by leveraging satellite-derived inputs and a diffusion-based framework. It introduces the LSTAT-20K benchmark and DiffTemp, a conditional diffusion model with ControlNet that preserves spatial patterns and enables 100m-scale downscaling from $LST$ and $LULC$ features. Across same-resolution, super-resolution, and sparse-sample settings, DiffTemp achieves state-of-the-art performance with superior SSIM and lower MAE/RMSE, while enabling urban-planning simulations. The study provides a CV-friendly pathway for urban microclimate research and establishes a benchmark to spur future work in high-resolution environmental mapping.

Abstract

Urbanization as a global trend has led to many environmental challenges, including the urban heat island (UHI) effect. The increase in temperature has a significant impact on the well-being of urban residents. Air temperature ($T_a$) at 2m above the surface is a key indicator of the UHI effect. How land use land cover (LULC) affects $T_a$ is a critical research question which requires high-resolution (HR) $T_a$ data at neighborhood scale. However, weather stations providing $T_a$ measurements are sparsely distributed e.g. more than 10km apart; and numerical models are impractically slow and computationally expensive. In this work, we propose a novel method to predict HR $T_a$ at 100m ground separation distance (gsd) using land surface temperature (LST) and other LULC related features which can be easily obtained from satellite imagery. Our method leverages diffusion models for the first time to generate accurate and visually realistic HR $T_a$ maps, which outperforms prior methods. We pave the way for meteorological research using computer vision techniques by providing a dataset of an extended spatial and temporal coverage, and a high spatial resolution as a benchmark for future research. Furthermore, we show that our model can be applied to urban planning by simulating the impact of different urban designs on $T_a$.

Urban Air Temperature Prediction using Conditional Diffusion Models

TL;DR

This work tackles high-resolution urban air temperature prediction () by leveraging satellite-derived inputs and a diffusion-based framework. It introduces the LSTAT-20K benchmark and DiffTemp, a conditional diffusion model with ControlNet that preserves spatial patterns and enables 100m-scale downscaling from and features. Across same-resolution, super-resolution, and sparse-sample settings, DiffTemp achieves state-of-the-art performance with superior SSIM and lower MAE/RMSE, while enabling urban-planning simulations. The study provides a CV-friendly pathway for urban microclimate research and establishes a benchmark to spur future work in high-resolution environmental mapping.

Abstract

Urbanization as a global trend has led to many environmental challenges, including the urban heat island (UHI) effect. The increase in temperature has a significant impact on the well-being of urban residents. Air temperature () at 2m above the surface is a key indicator of the UHI effect. How land use land cover (LULC) affects is a critical research question which requires high-resolution (HR) data at neighborhood scale. However, weather stations providing measurements are sparsely distributed e.g. more than 10km apart; and numerical models are impractically slow and computationally expensive. In this work, we propose a novel method to predict HR at 100m ground separation distance (gsd) using land surface temperature (LST) and other LULC related features which can be easily obtained from satellite imagery. Our method leverages diffusion models for the first time to generate accurate and visually realistic HR maps, which outperforms prior methods. We pave the way for meteorological research using computer vision techniques by providing a dataset of an extended spatial and temporal coverage, and a high spatial resolution as a benchmark for future research. Furthermore, we show that our model can be applied to urban planning by simulating the impact of different urban designs on .

Paper Structure

This paper contains 18 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Task of HR $T_a$ prediction given LST and LULC features derived from satellite imagery.
  • Figure 2: Illustration of LSTAT-20K dataset. Each row represents a paired-up sample of $T_a$, LST, RGB, NDVI, NDBI and NDWI images and metadata, with $T_a$ and LST plotted on a common scale for visual comparison ($T_a$ is generally more uniform and has a narrower range compared to LST.). Samples (b) and (c) illustrate scan line failures and clouds on the satellite images respectively.
  • Figure 3: The pipeline of DiffTemp. The forward process adds noise to the target $T_a$ image until close to the LST image. The ControlNet consumes the target latent of $T_a$ and takes as conditioning images the satellite-derived LST, RGB and the index images and metadata. The denoising U-Net predicts the noise at each step given the target latent of $T_a$ and residuals from ControlNet's downsampling and middle blocks. The reverse process removes noise from the LST image to recover the target $T_a$ image.
  • Figure 4: Qualitative comparison of the predicted $T_a$ maps by ours (DiffTemp) and the prior methods.
  • Figure 5: By simulating different urban designs, we show the impact of water bodies, green spaces, and buildings on the air temperature at neighborhood scale. Note: LST and $Ta$ are not drawn to the same scale for better visualizing the temperature characteristics within the image.