Table of Contents
Fetching ...

Transformer based super-resolution downscaling for regional reanalysis: Full domain vs tiling approaches

Antonio Pérez, Mario Santa Cruz, Daniel San Martín, José Manuel Gutiérrez

Abstract

Super-resolution (SR) is a promising cost-effective downscaling methodology for producing high-resolution climate information from coarser counterparts. A particular application is downscaling regional reanalysis outputs (predictand) from the driving global counterparts (predictor). This study conducts an intercomparison of various SR downscaling methods focusing on temperature and using the CERRA reanalysis (5.5 km resolution, produced with a regional atmospheric model driven by ERA5) as example. The method proposed in this work is the Swin transformer and two alternative methods are used as benchmark (fully convolutional U-Net and convolutional and dense DeepESD) as well as the simple bicubic interpolation. We compare two approaches, the standard one using the full domain as input and a more scalable tiling approach, dividing the full domain into tiles that are used as input. The methods are trained to downscale CERRA surface temperature, based on temperature information from the driving ERA5; in addition, the tiling approach includes static orographic information. We show that the tiling approach, which requires spatial transferability, comes at the cost of a lower performance (although it outperforms some full-domain benchmarks), but provides an efficient scalable solution that allows SR reduction on a pan-European scale and is valuable for real-time applications.

Transformer based super-resolution downscaling for regional reanalysis: Full domain vs tiling approaches

Abstract

Super-resolution (SR) is a promising cost-effective downscaling methodology for producing high-resolution climate information from coarser counterparts. A particular application is downscaling regional reanalysis outputs (predictand) from the driving global counterparts (predictor). This study conducts an intercomparison of various SR downscaling methods focusing on temperature and using the CERRA reanalysis (5.5 km resolution, produced with a regional atmospheric model driven by ERA5) as example. The method proposed in this work is the Swin transformer and two alternative methods are used as benchmark (fully convolutional U-Net and convolutional and dense DeepESD) as well as the simple bicubic interpolation. We compare two approaches, the standard one using the full domain as input and a more scalable tiling approach, dividing the full domain into tiles that are used as input. The methods are trained to downscale CERRA surface temperature, based on temperature information from the driving ERA5; in addition, the tiling approach includes static orographic information. We show that the tiling approach, which requires spatial transferability, comes at the cost of a lower performance (although it outperforms some full-domain benchmarks), but provides an efficient scalable solution that allows SR reduction on a pan-European scale and is valuable for real-time applications.

Paper Structure

This paper contains 12 sections, 7 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Spatial domain of the data sources used in this study. (top) High-resolution topography of the region of interest acquired from the NASA Shuttle Radar Topography Mission (SRTM). (bottom left) The ERA5 domain covers a range from 12°E to 8°W longitude and 33°N to 47°N latitude, with an elevation map at a spatial resolution of 0.25°. (bottom right) The CERRA domain spans from 10°E to 6°W longitude and 35°N to 45°N latitude, with an elevation map at a finer spatial resolution of 0.05°. The left panel shows the elevation from ERA5, while the right panel displays the higher-resolution elevation from CERRA.
  • Figure 2: Diagram of the Swin2SR model for the full-domain approach illustrating the integration of the Swin v2 transformer model with a preprocessing module for upscaling spatial resolution. The full model upscales the processed input by a factor of 4, ensuring the output shape matches the region size of (200, 320).
  • Figure 3: Visual representation of the data division into patches in the tiling implementation. On the left side, ERA5 orography data with an example of two different CERRA patches (red) surrounded by their corresponding ERA5 patch (blue) of size (13, 13). On the right side, CERRA orography with the full domain divided into 40 equal-size tiles.
  • Figure 4: (Left) Schematic representation of the Swin2SR model for the patches approach, depicting the segmentation of the input data into smaller patches. The model upscales the processed patch by a factor of 4, ensuring an output shape of size (40, 40). (Right) Detailed view of the Swin2SR model blocks. The Processor Block (highlighted in pink) initiates the process with a 1x1 convolution followed by a centre crop, reducing the dimensions of the input data to a more manageable size for subsequent operations. The Upscaling Block (highlighted in blue) handles the enhancement of resolution, where concatenation with encoded high-resolution covariates is followed by a sequence of Swin2SR operations and pixel shuffling, progressively increasing the resolution from smaller patches to a final output size of 40x40 pixels. Finally, the Denoising Block (highlighted in red) further processes the upscaled data to reduce noise, applying batch normalisation, another Swin2SR operation, and a final pixel shuffle to ensure the output is both high-resolution and clean.
  • Figure 5: Architecture of the UNet model and detailed representation of its encoder-decoder block.
  • ...and 5 more figures