UrbanDIFF: A Denoising Diffusion Model for Spatial Gap Filling of Urban Land Surface Temperature Under Dense Cloud Cover

Arya Chavoshi; Hassan Dashtian; Naveen Sudharsan; Dev Niyogi

UrbanDIFF: A Denoising Diffusion Model for Spatial Gap Filling of Urban Land Surface Temperature Under Dense Cloud Cover

Arya Chavoshi, Hassan Dashtian, Naveen Sudharsan, Dev Niyogi

TL;DR

Cloud-contaminated LST hinders continuous SUHI monitoring, motivating UrbanDIFF, a purely spatial diffusion-based gap-filling method conditioned on static urban structure and elevation. The approach uses DDPM with pixel-guided refinement and RePaint-style projection to enforce consistency with revealed pixels, trained on MODIS Terra LST across seven US metros (2002–2025). Synthetic cloud tests show UrbanDIFF outperforms a baseline interpolation, especially under dense occlusion, with robust SUHI estimation and cross-city consistency. While not a full operational replacement for spatiotemporal or multi-sensor methods, UrbanDIFF provides a strong methodological foundation for purely spatial LST reconstruction and future extensions that incorporate temporal context and uncertainty handling.

Abstract

Satellite-derived Land Surface Temperature (LST) products are central to surface urban heat island (SUHI) monitoring due to their consistent grid-based coverage over large metropolitan regions. However, cloud contamination frequently obscures LST observations, limiting their usability for continuous SUHI analysis. Most existing LST reconstruction methods rely on multitemporal information or multisensor data fusion, requiring auxiliary observations that may be unavailable or unreliable under persistent cloud cover. Purely spatial gap-filling approaches offer an alternative, but traditional statistical methods degrade under large or spatially contiguous gaps, while many deep learning based spatial models deteriorate rapidly with increasing missingness. Recent advances in denoising diffusion based image inpainting models have demonstrated improved robustness under high missingness, motivating their adoption for spatial LST reconstruction. In this work, we introduce UrbanDIFF, a purely spatial denoising diffusion model for reconstructing cloud contaminated urban LST imagery. The model is conditioned on static urban structure information, including built-up surface data and a digital elevation model, and enforces strict consistency with revealed cloud free pixels through a supervised pixel guided refinement step during inference. UrbanDIFF is trained and evaluated using NASA MODIS Terra LST data from seven major United States metropolitan areas spanning 2002 to 2025. Experiments using synthetic cloud masks with 20 to 85 percent coverage show that UrbanDIFF consistently outperforms an interpolation baseline, particularly under dense cloud occlusion, achieving SSIM of 0.89, RMSE of 1.2 K, and R2 of 0.84 at 85 percent cloud coverage, while exhibiting slower performance degradation as cloud density increases.

UrbanDIFF: A Denoising Diffusion Model for Spatial Gap Filling of Urban Land Surface Temperature Under Dense Cloud Cover

TL;DR

Abstract

UrbanDIFF: A Denoising Diffusion Model for Spatial Gap Filling of Urban Land Surface Temperature Under Dense Cloud Cover

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)