Forecasting the Future with Yesterday's Climate: Temperature Bias in AI Weather and Climate Models

Jacob B. Landsberg; Elizabeth A. Barnes

Forecasting the Future with Yesterday's Climate: Temperature Bias in AI Weather and Climate Models

Jacob B. Landsberg, Elizabeth A. Barnes

TL;DR

This study probes why AI weather and climate models trained on historical data struggle to predict future climates. By examining boreal-winter 2m temperatures from two weather models (FourCastNet V2, Pangu) and one climate model (ACE2) against ERA5 for periods beyond their training data, the authors reveal systematic cold biases that pull forecasts toward older climates by 15–30 years. The weather models show the strongest biases in the hottest forecasts, while ACE2 biases are greatest in the coldest forecasts, aligning with regional warming trends and training distributions. The findings highlight extrapolation limitations in data-driven models and advocate for training-data augmentation and climate-robust design to mitigate these biases in future climate prediction.

Abstract

AI-based climate and weather models have rapidly gained popularity, providing faster forecasts with skill that can match or even surpass that of traditional dynamical models. Despite this success, these models face a key challenge: predicting future climates while being trained only with historical data. In this study, we investigate this issue by analyzing boreal winter land temperature biases in AI weather and climate models. We examine two weather models, FourCastNet V2 Small (FourCastNet) and Pangu Weather (Pangu), evaluating their predictions for 2020-2025 and Ai2 Climate Emulator version 2 (ACE2) for 1996-2010. These time periods lie outside of the respective models' training sets and are significantly more recent than the bulk of their training data, allowing us to assess how well the models generalize to new, i.e. more modern, conditions. We find that all three models produce cold-biased mean temperatures, resembling climates from 15-20 years earlier than the period they are predicting. In some regions, like the Eastern U.S., the predictions resemble climates from as much as 20-30 years earlier. Further analysis shows that FourCastNet's and Pangu's cold bias is strongest in the hottest predicted temperatures, indicating limited training exposure to modern extreme heat events. In contrast, ACE2's bias is more evenly distributed but largest in regions, seasons, and parts of the temperature distribution where climate change has been most pronounced. These findings underscore the challenge of training AI models exclusively on historical data and highlight the need to account for such biases when applying them to future climate prediction.

Forecasting the Future with Yesterday's Climate: Temperature Bias in AI Weather and Climate Models

TL;DR

Abstract

Forecasting the Future with Yesterday's Climate: Temperature Bias in AI Weather and Climate Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)