WGAST: Weakly-Supervised Generative Network for Daily 10 m Land Surface Temperature Estimation via Spatio-Temporal Fusion
Sofiane Bouaziz, Adel Hafiane, Raphael Canals, Rachid Nedjai
TL;DR
WGAST tackles the challenge of generating daily 10 m land surface temperature by fusing Terra MODIS, Landsat 8, and Sentinel-2 data within a weakly supervised generative adversarial framework. The generator uses multi-level feature extraction, cosine-similarity fusion, AdaIN alignment, and temporal attention, followed by a U-Net–like reconstruction and Gaussian noise suppression; training relies on 3×3 spatial averaging to match 30 m Landsat references. A PatchGAN discriminator enforces realism conditioned on MODIS context, while a composite loss blends adversarial, content, spectrum, and perceptual terms. Across an urban ROI and multiple global regions, WGAST consistently outperforms baselines, achieves strong correlations with near-surface air temperature, and demonstrates robust spatio-temporal generalization, marking a significant advance in high-resolution daily LST mapping.
Abstract
Urbanization, climate change, and agricultural stress are increasing the demand for precise and timely environmental monitoring. Land Surface Temperature (LST) is a key variable in this context and is retrieved from remote sensing satellites. However, these systems face a trade-off between spatial and temporal resolution. While spatio-temporal fusion methods offer promising solutions, few have addressed the estimation of daily LST at 10 m resolution. In this study, we present WGAST, a weakly-supervised generative network for daily 10 m LST estimation via spatio-temporal fusion of Terra MODIS, Landsat 8, and Sentinel-2. WGAST is the first end-to-end deep learning framework designed for this task. It adopts a conditional generative adversarial architecture, with a generator composed of four stages: feature extraction, fusion, LST reconstruction, and noise suppression. The first stage employs a set of encoders to extract multi-level latent representations from the inputs, which are then fused in the second stage using cosine similarity, normalization, and temporal attention mechanisms. The third stage decodes the fused features into high-resolution LST, followed by a Gaussian filter to suppress high-frequency noise. Training follows a weakly supervised strategy based on physical averaging principles and reinforced by a PatchGAN discriminator. Experiments demonstrate that WGAST outperforms existing methods in both quantitative and qualitative evaluations. Compared to the best-performing baseline, on average, WGAST reduces RMSE by 17.05% and improves SSIM by 4.22%. Furthermore, WGAST effectively captures fine-scale thermal patterns, as validated against near-surface air temperature measurements from 33 near-ground sensors. The code is available at https://github.com/Sofianebouaziz1/WGAST.git.
