Table of Contents
Fetching ...

On the Foundations of Earth and Climate Foundation Models

Xiao Xiang Zhu, Zhitong Xiong, Yi Wang, Adam J. Stewart, Konrad Heidler, Yuanyuan Wang, Zhenghang Yuan, Thomas Dujardin, Qingsong Xu, Yilei Shi

Abstract

Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an environmental- and human-centric manner.We further shed light on the way forward to achieve the ideal model and to evaluate Earth foundation models. What comes after foundation models? Energy efficient adaptation, adversarial defenses, and interpretability are among the emerging directions.

On the Foundations of Earth and Climate Foundation Models

Abstract

Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an environmental- and human-centric manner.We further shed light on the way forward to achieve the ideal model and to evaluate Earth foundation models. What comes after foundation models? Energy efficient adaptation, adversarial defenses, and interpretability are among the emerging directions.
Paper Structure (28 sections, 6 figures, 3 tables)

This paper contains 28 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The scheme of an Earth and climate FM. It should be trained on common data modalities, including imagery (radar, optical), non-Euclidean data (point clouds, text), and meteorological data. It should also provide consistency with physical laws. The FM is task-agnostic, exemplified by five possible downstream tasks. Because of the large difference of the characteristics among Earth observation, weather, and climate data, the resulting FM may consists of a pool of expert models, where feedback loops exist among each of them. For attribution of figure elements, please see Supplementary Information.
  • Figure 1: Visual reasoning datasets from the fields of Visual Question Answering, Remote Sensing Image Captioning, Change Detection Visual Question Answering, and Image-Text Cross-Modal Retrieval in remote sensing, with data point sizes proportional to citation counts. The curves represent CLIP models' learning and overfitting "frontiers": CLIP-ViT-L-14 shows a transition from overfitting to learning for all sorts of image-text downstream tasks, as shown by SkyScript and RemoteCLIP wang2023skyscriptliu2023remoteclip and illustrated by the green curve. On the far right, CLIP-ViT-H-14 has the potential for learning adequate multimodal representations across simple downstream tasks without overfitting, but is still prone to poorer performances than smaller models as the downstream tasks get more complex, as demonstrated empirically by RS5M zhang2024rs5m. This is indicative of overfitting. The $y$-axis represents the extent to which the dataset creation relied on automatic methods. Manual annotations are given a note of 1, machine-assisted human annotations and fully automatized annotations are respectively given a 2 and a 3. Note that the datasets from the figure might overlap.
  • Figure 2: Volume of EO and climate data archives and curated benchmark datasets.
  • Figure 3: The ideal Earth and climate FM. It should possess at least eight "must-have" features and three "highly desirable" features. Among them, the must-haves (1--8) include geolocation embedding, balanced geographic representations, scale awareness, wavelength embeddings, the time variable, multisensory, task-agnostic and carbon minimized. They define the basic functionality of an ideal Earth and climate FM, i.e., the ability to support any downstream task regardless of input data at any location in an environmentally friendly manner. The highly desirable features (9--11) are uncertainty quantification, physical consistency, and ML assistants, ensuring the trustworthiness and human-centric design of an ideal Earth and climate FM. For attributions of figure elements, please see Supplementary Information.
  • Figure 4: Representative EO and climate FMs. Representative models are chosen based on both popularity and novelty. Note that no single model excels at all evaluation criteria, as most models focus on only one or two of all "must-have" features.
  • ...and 1 more figures