Table of Contents
Fetching ...

Transforming Weather Data from Pixel to Latent Space

Sijie Zhao, Feng Liu, Xueliang Zhang, Hao Chen, Tao Han, Junchao Gong, Ran Tao, Pengfeng Xiao, Lei Bai, Wanli Ouyang

TL;DR

The paper addresses high storage and limited cross-PVS applicability in pixel-space weather data by introducing the Weather Latent Autoencoder (WLA), which maps diverse weather variables across multiple pressure levels into a unified, low-storage latent space. WLA decouples weather reconstruction from downstream tasks using a Pressure-Variable Unified Module (PVUM), a VAEformer encoder-decoder, and a Binary Quantization Module (BQM), enabling efficient multi-PVS weather task modeling. The authors validate WLA on ERA5 data, achieving dramatic compression (e.g., reducing 244.34 TB to 0.43 TB for ERA5-latent) and enabling high-fidelity reconstructions and sharper forecasts; they also release the ERA5-latent dataset to support latent-space meteorological research. This work offers a scalable pathway for multi-PVS weather modeling with reduced data costs and preserved predictive sharpness, potentially accelerating large-scale meteorological studies in latent space.

Abstract

The increasing impact of climate change and extreme weather events has spurred growing interest in deep learning for weather research. However, existing studies often rely on weather data in pixel space, which presents several challenges such as smooth outputs in model outputs, limited applicability to a single pressure-variable subset (PVS), and high data storage and computational costs. To address these challenges, we propose a novel Weather Latent Autoencoder (WLA) that transforms weather data from pixel space to latent space, enabling efficient weather task modeling. By decoupling weather reconstruction from downstream tasks, WLA improves the accuracy and sharpness of weather task model results. The incorporated Pressure-Variable Unified Module transforms multiple PVS into a unified representation, enhancing the adaptability of the model in multiple weather scenarios. Furthermore, weather tasks can be performed in a low-storage latent space of WLA rather than a high-storage pixel space, thus significantly reducing data storage and computational costs. Through extensive experimentation, we demonstrate its superior compression and reconstruction performance, enabling the creation of the ERA5-latent dataset with unified representations of multiple PVS from ERA5 data. The compressed full PVS in the ERA5-latent dataset reduces the original 244.34 TB of data to 0.43 TB. The downstream task further demonstrates that task models can apply to multiple PVS with low data costs in latent space and achieve superior performance compared to models in pixel space. Code, ERA5-latent data, and pre-trained models are available at https://anonymous.4open.science/r/Weather-Latent-Autoencoder-8467.

Transforming Weather Data from Pixel to Latent Space

TL;DR

The paper addresses high storage and limited cross-PVS applicability in pixel-space weather data by introducing the Weather Latent Autoencoder (WLA), which maps diverse weather variables across multiple pressure levels into a unified, low-storage latent space. WLA decouples weather reconstruction from downstream tasks using a Pressure-Variable Unified Module (PVUM), a VAEformer encoder-decoder, and a Binary Quantization Module (BQM), enabling efficient multi-PVS weather task modeling. The authors validate WLA on ERA5 data, achieving dramatic compression (e.g., reducing 244.34 TB to 0.43 TB for ERA5-latent) and enabling high-fidelity reconstructions and sharper forecasts; they also release the ERA5-latent dataset to support latent-space meteorological research. This work offers a scalable pathway for multi-PVS weather modeling with reduced data costs and preserved predictive sharpness, potentially accelerating large-scale meteorological studies in latent space.

Abstract

The increasing impact of climate change and extreme weather events has spurred growing interest in deep learning for weather research. However, existing studies often rely on weather data in pixel space, which presents several challenges such as smooth outputs in model outputs, limited applicability to a single pressure-variable subset (PVS), and high data storage and computational costs. To address these challenges, we propose a novel Weather Latent Autoencoder (WLA) that transforms weather data from pixel space to latent space, enabling efficient weather task modeling. By decoupling weather reconstruction from downstream tasks, WLA improves the accuracy and sharpness of weather task model results. The incorporated Pressure-Variable Unified Module transforms multiple PVS into a unified representation, enhancing the adaptability of the model in multiple weather scenarios. Furthermore, weather tasks can be performed in a low-storage latent space of WLA rather than a high-storage pixel space, thus significantly reducing data storage and computational costs. Through extensive experimentation, we demonstrate its superior compression and reconstruction performance, enabling the creation of the ERA5-latent dataset with unified representations of multiple PVS from ERA5 data. The compressed full PVS in the ERA5-latent dataset reduces the original 244.34 TB of data to 0.43 TB. The downstream task further demonstrates that task models can apply to multiple PVS with low data costs in latent space and achieve superior performance compared to models in pixel space. Code, ERA5-latent data, and pre-trained models are available at https://anonymous.4open.science/r/Weather-Latent-Autoencoder-8467.

Paper Structure

This paper contains 17 sections, 1 equation, 12 figures, 1 table.

Figures (12)

  • Figure 1: Transforming weather data from diverse and high-storage pixel space to unified and low-storage latent space for weather tasks using weather latent autoencoder. The weather task model in pixel space suffers from high data storage and computational costs and limited applicability to single pressure-variable subset, often yielding ambiguous results. In contrast, the model in latent space benefits from reduced data storage and computational costs, enabling the use of multiple pressure-variable subsets and producing sharper results.
  • Figure 2: Architecture of the Weather Latent Autoencoder, which compresses weather data from a diverse, high-storage pixel space into a unified, low-storage latent space, and reconstructs it back into the pixel space.
  • Figure 3: Workflow of Pressure-Variable Unified Module, which transforms diverse weather data into unified representation.
  • Figure 4: Overview of the Latent Space Framework. The data-intensive processes can be performed in the low-storage latent space, while processes requiring a smaller amount of data can be carried out in the high-storage pixel space, thereby effectively reducing data costs.
  • Figure 5: Ablation study on compression ratio and reconstruction quality of the WLA under varying input pressure levels (6, 13, 25 layers) and codebook sizes ($2^{16}$ to $2^{128}$), evaluated on the atmospheric temperature variable.
  • ...and 7 more figures