Transforming Weather Data from Pixel to Latent Space
Sijie Zhao, Feng Liu, Xueliang Zhang, Hao Chen, Tao Han, Junchao Gong, Ran Tao, Pengfeng Xiao, Lei Bai, Wanli Ouyang
TL;DR
The paper addresses high storage and limited cross-PVS applicability in pixel-space weather data by introducing the Weather Latent Autoencoder (WLA), which maps diverse weather variables across multiple pressure levels into a unified, low-storage latent space. WLA decouples weather reconstruction from downstream tasks using a Pressure-Variable Unified Module (PVUM), a VAEformer encoder-decoder, and a Binary Quantization Module (BQM), enabling efficient multi-PVS weather task modeling. The authors validate WLA on ERA5 data, achieving dramatic compression (e.g., reducing 244.34 TB to 0.43 TB for ERA5-latent) and enabling high-fidelity reconstructions and sharper forecasts; they also release the ERA5-latent dataset to support latent-space meteorological research. This work offers a scalable pathway for multi-PVS weather modeling with reduced data costs and preserved predictive sharpness, potentially accelerating large-scale meteorological studies in latent space.
Abstract
The increasing impact of climate change and extreme weather events has spurred growing interest in deep learning for weather research. However, existing studies often rely on weather data in pixel space, which presents several challenges such as smooth outputs in model outputs, limited applicability to a single pressure-variable subset (PVS), and high data storage and computational costs. To address these challenges, we propose a novel Weather Latent Autoencoder (WLA) that transforms weather data from pixel space to latent space, enabling efficient weather task modeling. By decoupling weather reconstruction from downstream tasks, WLA improves the accuracy and sharpness of weather task model results. The incorporated Pressure-Variable Unified Module transforms multiple PVS into a unified representation, enhancing the adaptability of the model in multiple weather scenarios. Furthermore, weather tasks can be performed in a low-storage latent space of WLA rather than a high-storage pixel space, thus significantly reducing data storage and computational costs. Through extensive experimentation, we demonstrate its superior compression and reconstruction performance, enabling the creation of the ERA5-latent dataset with unified representations of multiple PVS from ERA5 data. The compressed full PVS in the ERA5-latent dataset reduces the original 244.34 TB of data to 0.43 TB. The downstream task further demonstrates that task models can apply to multiple PVS with low data costs in latent space and achieve superior performance compared to models in pixel space. Code, ERA5-latent data, and pre-trained models are available at https://anonymous.4open.science/r/Weather-Latent-Autoencoder-8467.
