On the Opportunities of (Re)-Exploring Atmospheric Science by Foundation Models: A Case Study
Lujia Zhang, Hanzhe Cui, Yurong Song, Chenyue Li, Binhang Yuan, Mengqian Lu
TL;DR
The paper investigates whether a state-of-the-art multimodal foundation model, GPT-4o, can address broad atmospheric-science tasks by evaluating its performance across four task classes: climate data processing, physical diagnosis, forecast/prediction, and adaptation/mitigation. It demonstrates strong capabilities in information extraction, numerical calculations, and classical analyses (e.g., EOF via PCA) with transparent, reproducible code outputs, but reveals substantial limitations in reliable short- to long-range forecasting (e.g., $24$–$96$ hour scales and ENSO predictions) when domain-specific modeling and papers are not embedded in the workflow. The study highlights the potential of GPT-4o to automate tedious routines and support reasoning, while also underscoring the need for domain-specific foundation models and human–AI collaboration to achieve robust, physics-grounded predictions. Overall, the work provides a realistic assessment of current FM capabilities in atmospheric science, guiding future research toward targeted model development, data-handling improvements, and prompt-engineering strategies that leverage domain knowledge.
Abstract
Most state-of-the-art AI applications in atmospheric science are based on classic deep learning approaches. However, such approaches cannot automatically integrate multiple complicated procedures to construct an intelligent agent, since each functionality is enabled by a separate model learned from independent climate datasets. The emergence of foundation models, especially multimodal foundation models, with their ability to process heterogeneous input data and execute complex tasks, offers a substantial opportunity to overcome this challenge. In this report, we want to explore a central question - how the state-of-the-art foundation model, i.e., GPT-4o, performs various atmospheric scientific tasks. Toward this end, we conduct a case study by categorizing the tasks into four main classes, including climate data processing, physical diagnosis, forecast and prediction, and adaptation and mitigation. For each task, we comprehensively evaluate the GPT-4o's performance along with a concrete discussion. We hope that this report may shed new light on future AI applications and research in atmospheric science.
