ViSIR: Vision Transformer Single Image Reconstruction Method for Earth System Models
Ehsan Zeraatkar, Salah Faroughi, Jelena Tešić
TL;DR
This work introduces ViSIR, a hybrid Vision Transformer–SIREN framework for single-image super-resolution of Earth System Model outputs. By embedding a SIREN-based, frequency-tuned implicit representation into the ViT final layer, ViSIR effectively mitigates spectral bias and preserves high-frequency details in SR tasks. Across an E3SM-derived benchmark dataset, ViSIR achieves substantial improvements in PSNR, SSIM, and MSE over ViT, SIREN, SRCNN, and SRGAN baselines, including notable gains of over 10 dB PSNR relative to SIREN and strong performance in corner cases. The approach promises enhanced high-resolution climate imagery fidelity, with future work aimed at efficiency, multi-image/video extension, and uncertainty quantification to support practical deployment in climate modeling and decision-making.
Abstract
Purpose: Earth system models (ESMs) integrate the interactions of the atmosphere, ocean, land, ice, and biosphere to estimate the state of regional and global climate under a wide variety of conditions. The ESMs are highly complex; thus, deep neural network architectures are used to model the complexity and store the down-sampled data. This paper proposes the Vision Transformer Sinusoidal Representation Networks (ViSIR) to improve the ESM data's single image SR (SR) reconstruction task. Methods: ViSIR combines the SR capability of Vision Transformers (ViT) with the high-frequency detail preservation of the Sinusoidal Representation Network (SIREN) to address the spectral bias observed in SR tasks. Results: The ViSIR outperforms SRCNN by 2.16 db, ViT by 6.29 dB, SIREN by 8.34 dB, and SR-Generative Adversarial (SRGANs) by 7.93 dB PSNR on average for three different measurements. Conclusion: The proposed ViSIR is evaluated and compared with state-of-the-art methods. The results show that the proposed algorithm is outperforming other methods in terms of Mean Square Error(MSE), Peak-Signal-to-Noise-Ratio(PSNR), and Structural Similarity Index Measure(SSIM).
