Table of Contents
Fetching ...

When Geoscience Meets Generative AI and Large Language Models: Foundations, Trends, and Future Challenges

Abdenour Hadid, Tanujit Chakraborty, Daniel Busby

TL;DR

This survey addresses how Generative AI and Large Language Models can transform geoscience by surveying GAN-based data generation, foundation-model reasoning, and physics-informed approaches. It highlights concrete systems such as SeismoGen, CityGAN, GeoChat, TimeGPT, and K2, and discusses essential benchmark datasets and data sources. The authors articulate practical applications across reservoir engineering, facies modeling, remote sensing, and time-series forecasting, while candidly addressing challenges in data quality, computational demands, and trustworthiness. The work aims to guide researchers and practitioners toward responsible, scalable adoption of GAI in Earth system sciences and outlines directions for future research and governance.

Abstract

Generative Artificial Intelligence (GAI) represents an emerging field that promises the creation of synthetic data and outputs in different modalities. GAI has recently shown impressive results across a large spectrum of applications ranging from biology, medicine, education, legislation, computer science, and finance. As one strives for enhanced safety, efficiency, and sustainability, generative AI indeed emerges as a key differentiator and promises a paradigm shift in the field. This paper explores the potential applications of generative AI and large language models in geoscience. The recent developments in the field of machine learning and deep learning have enabled the generative model's utility for tackling diverse prediction problems, simulation, and multi-criteria decision-making challenges related to geoscience and Earth system dynamics. This survey discusses several GAI models that have been used in geoscience comprising generative adversarial networks (GANs), physics-informed neural networks (PINNs), and generative pre-trained transformer (GPT)-based structures. These tools have helped the geoscience community in several applications, including (but not limited to) data generation/augmentation, super-resolution, panchromatic sharpening, haze removal, restoration, and land surface changing. Some challenges still remain such as ensuring physical interpretation, nefarious use cases, and trustworthiness. Beyond that, GAI models show promises to the geoscience community, especially with the support to climate change, urban science, atmospheric science, marine science, and planetary science through their extraordinary ability to data-driven modeling and uncertainty quantification.

When Geoscience Meets Generative AI and Large Language Models: Foundations, Trends, and Future Challenges

TL;DR

This survey addresses how Generative AI and Large Language Models can transform geoscience by surveying GAN-based data generation, foundation-model reasoning, and physics-informed approaches. It highlights concrete systems such as SeismoGen, CityGAN, GeoChat, TimeGPT, and K2, and discusses essential benchmark datasets and data sources. The authors articulate practical applications across reservoir engineering, facies modeling, remote sensing, and time-series forecasting, while candidly addressing challenges in data quality, computational demands, and trustworthiness. The work aims to guide researchers and practitioners toward responsible, scalable adoption of GAI in Earth system sciences and outlines directions for future research and governance.

Abstract

Generative Artificial Intelligence (GAI) represents an emerging field that promises the creation of synthetic data and outputs in different modalities. GAI has recently shown impressive results across a large spectrum of applications ranging from biology, medicine, education, legislation, computer science, and finance. As one strives for enhanced safety, efficiency, and sustainability, generative AI indeed emerges as a key differentiator and promises a paradigm shift in the field. This paper explores the potential applications of generative AI and large language models in geoscience. The recent developments in the field of machine learning and deep learning have enabled the generative model's utility for tackling diverse prediction problems, simulation, and multi-criteria decision-making challenges related to geoscience and Earth system dynamics. This survey discusses several GAI models that have been used in geoscience comprising generative adversarial networks (GANs), physics-informed neural networks (PINNs), and generative pre-trained transformer (GPT)-based structures. These tools have helped the geoscience community in several applications, including (but not limited to) data generation/augmentation, super-resolution, panchromatic sharpening, haze removal, restoration, and land surface changing. Some challenges still remain such as ensuring physical interpretation, nefarious use cases, and trustworthiness. Beyond that, GAI models show promises to the geoscience community, especially with the support to climate change, urban science, atmospheric science, marine science, and planetary science through their extraordinary ability to data-driven modeling and uncertainty quantification.
Paper Structure (10 sections, 3 equations, 6 figures, 2 tables)

This paper contains 10 sections, 3 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The relation between large language models (LLMs), generative AI, and other related learning schemes.
  • Figure 2: Some potential application domains of generative AI in geoscience.
  • Figure 3: Popularly-used Generative AI models for geoscience applications. These deep learning (DL) frameworks learn features automatically from the data by extracting information from it. GANs work as an unsupervised framework, whereas PINNs are hybrid physics+DL models for modeling system dynamics and solving differential equations. Foundation models mainly focus on in-context learning by using a pre-trained architecture. These models are widely applicable to geoscience applications.
  • Figure 4: Overview of a Generative AI model pipeline for Geoscience Applications. Multimodal self-supervised learning algorithms train various data types (image, text, speech, numerical data) using multidimensional geoscience data models from satellite imagery, weather, earth observations, and rivers. An example of prompt engineering using GeoChat is represented (right below) for demonstration.
  • Figure 5: Connectionism between Geoscience foundation models and their broad usage in geosciences and related technologies. (Left) Data collected from various sources in geoscience, including space, air, ground, and ocean machinery, provide multimodal data for model training that analyzes the data using GAI models and performs tasks in real-time that are specified by the user to support geoscience research and advancements.
  • ...and 1 more figures