Frequency-Aware Vision Transformers for High-Fidelity Super-Resolution of Earth System Models

Ehsan Zeraatkar; Salah A Faroughi; Jelena Tešić

Frequency-Aware Vision Transformers for High-Fidelity Super-Resolution of Earth System Models

Ehsan Zeraatkar, Salah A Faroughi, Jelena Tešić

TL;DR

This work tackles the spectral bias challenge in downsampling Earth System Model outputs by introducing two frequency-aware SR architectures, ViSIR and ViFOR, that blend Vision Transformers with frequency-sensitive representations. ViSIR extends ViT with sinusoidal activations in an INR to improve high-frequency detail, while ViFOR adds explicit Fourier-based filtering to decouple and learn low- and high-frequency content. On the E3SM-HR dataset, ViSIR substantially outperforms baselines, and ViFOR achieves state-of-the-art PSNR and SSIM across multiple climate variables, particularly when trained on full-field images. The results underscore the importance of global context and explicit frequency decomposition for climate data downscaling, with potential extensions to spatio-temporal SR and physics-constrained learning for broader scientific impact.

Abstract

Super-resolution (SR) is crucial for enhancing the spatial fidelity of Earth System Model (ESM) outputs, allowing fine-scale structures vital to climate science to be recovered from coarse simulations. However, traditional deep super-resolution methods, including convolutional and transformer-based models, tend to exhibit spectral bias, reconstructing low-frequency content more readily than valuable high-frequency details. In this work, we introduce two frequency-aware frameworks: the Vision Transformer-Tuned Sinusoidal Implicit Representation (ViSIR), combining Vision Transformers and sinusoidal activations to mitigate spectral bias, and the Vision Transformer Fourier Representation Network (ViFOR), which integrates explicit Fourier-based filtering for independent low- and high-frequency learning. Evaluated on the E3SM-HR Earth system dataset across surface temperature, shortwave, and longwave fluxes, these models outperform leading CNN, GAN, and vanilla transformer baselines, with ViFOR demonstrating up to 2.6~dB improvements in PSNR and significantly higher SSIM. Detailed ablation and scaling studies highlight the benefit of full-field training, the impact of frequency hyperparameters, and the potential for generalization. The results establish ViFOR as a state-of-the-art, scalable solution for climate data downscaling. Future extensions will address temporal super-resolution, multimodal climate variables, automated parameter selection, and integration of physical conservation constraints to broaden scientific applicability.

Frequency-Aware Vision Transformers for High-Fidelity Super-Resolution of Earth System Models

TL;DR

Abstract

Frequency-Aware Vision Transformers for High-Fidelity Super-Resolution of Earth System Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)