WeatherFormer: A Pretrained Encoder Model for Learning Robust Weather Representations from Small Datasets

Adib Hasan; Mardavij Roozbehani; Munther Dahleh

WeatherFormer: A Pretrained Encoder Model for Learning Robust Weather Representations from Small Datasets

Adib Hasan, Mardavij Roozbehani, Munther Dahleh

TL;DR

WeatherFormer introduces a pretrained weather encoder that learns robust representations from a large satellite-based pretraining corpus, enabling effective learning on small downstream datasets. The model uses a novel spatiotemporal positional encoding and a pretraining task that predicts masked weather variables, improving generalization to tasks with limited observations. In finetuning experiments, WeatherFormer achieves state-of-the-art performance in county-level soybean yield prediction and influenza forecasting in NYC, demonstrating transfer to agriculture and epidemiology domains. The work suggests pretrained weather encoders can unlock performance gains across weather-dependent applications and motivates yearly retraining with updated data.

Abstract

This paper introduces WeatherFormer, a transformer encoder-based model designed to learn robust weather features from minimal observations. It addresses the challenge of modeling complex weather dynamics from small datasets, a bottleneck for many prediction tasks in agriculture, epidemiology, and climate science. WeatherFormer was pretrained on a large pretraining dataset comprised of 39 years of satellite measurements across the Americas. With a novel pretraining task and fine-tuning, WeatherFormer achieves state-of-the-art performance in county-level soybean yield prediction and influenza forecasting. Technical innovations include a unique spatiotemporal encoding that captures geographical, annual, and seasonal variations, adapting the transformer architecture to continuous weather data, and a pretraining strategy to learn representations that are robust to missing weather features. This paper for the first time demonstrates the effectiveness of pretraining large transformer encoder models for weather-dependent applications across multiple domains.

WeatherFormer: A Pretrained Encoder Model for Learning Robust Weather Representations from Small Datasets

TL;DR

Abstract

Paper Structure (35 sections, 6 equations, 8 figures, 5 tables)

This paper contains 35 sections, 6 equations, 8 figures, 5 tables.

Introduction
Background and Related Work
Foundational Models in Deep Learning
Self-Supervised Learning
Machine Learning-based Yield Prediction
Influenza Forecasting
Data Collection
Pretraining Dataset
Finetuing Datasets
Architecture
Feature Mask and Padding Mask
Scaling Parameters
Spatiotemporal Positional Encoding
Transformer Encoder and Output Projection
Pretraining
...and 20 more sections

Figures (8)

Figure 1: A large transformer model (WeatherFormer) is pretrained on a massive satellite-based weather dataset, enabling the model to learn rich representations of weather during pretraining. This pretrained model can extract robust weather features for a new prediction task (Small Dataset). Even when only a limited number of weather measurements are available for a specific task, the learned weather features can be used to improve the prediction accuracy.
Figure 2: County-level soybean yield and Influenza forecasting for New York City were used as the finetuning tasks for WeatherFormer. The mean soybean yield gradually increased due to hybrid vigor and better farming practices. On the other hand, the influenza seasons show clear peaks during the winter until 2019 and after that, the patterns became irregular due to COVID-19.
Figure 3: The forward pass of the weather inputs through the WeatherFormer architecture. The input is first multiplied with learnable input scalers and a feature mask and then projected to a hidden dimension through a linear layer. After that, the input goes through a transformer encoder with a novel spatiotemporal encoding mechanism and finally, the input is projected to an output dimension.
Figure 4: Soybean yield predictor architectures utilizing WeatherFormer. The weather measurements for the current year and the last few years are processed with WeatherFormer and the soil measurements are processed with a CNN reported in khaki2019cnn. Then the yield is predicted with either a linear layer or a transformer. The entire model is trained at once. Since yield for the current year is the target variable, it is replaced in the input with last year's yield.
Figure 5: Influenza Like Illness (ILI) percent predictor architecture utilizing WeatherFormer. The weather measurements are first processed with WeatherFormer to extract useful features. These weather features and the past influenza data are processed by either a transformer to predict ILI percent for the next 10 weeks.
...and 3 more figures

WeatherFormer: A Pretrained Encoder Model for Learning Robust Weather Representations from Small Datasets

TL;DR

Abstract

WeatherFormer: A Pretrained Encoder Model for Learning Robust Weather Representations from Small Datasets

Authors

TL;DR

Abstract

Table of Contents

Figures (8)