On the workflow, opportunities and challenges of developing foundation model in geophysics
Hanlin Sheng, Xinming Wu, Hang Gao, Haibin Di, Sergey Fomel, Jintao Li, Xu Si
TL;DR
This paper addresses the lack of a full-process review for geophysical foundation models by proposing a comprehensive workflow that spans data acquisition, preprocessing, model design, pretraining, and deployment. It emphasizes domain-specific challenges—diverse multimodal geophysical data, physical consistency, and data scarcity—and discusses strategies such as self-supervised and multimodal pretraining, physics-informed fine-tuning, model interaction, and retrieval-grounded deployment to improve robustness and interpretability. Key contributions include a systematic framework for building geophysical foundation models, guidance on data preparation and standardization, and deployment strategies with efficiency considerations (distillation, pruning, quantization, RAG, and agent-based approaches). The framework aims to accelerate practical adoption in subsurface imaging, resource exploration, disaster warning, and digital earth initiatives, enabling more scalable, interpretable, and socially responsible AI for Earth sciences.
Abstract
Foundation models, as a mainstream technology in artificial intelligence, have demonstrated immense potential across various domains in recent years, particularly in handling complex tasks and multimodal data. In the field of geophysics, although the application of foundation models is gradually expanding, there is currently a lack of comprehensive reviews discussing the full workflow of integrating foundation models with geophysical data. To address this gap, this paper presents a complete framework that systematically explores the entire process of developing foundation models in conjunction with geophysical data. From data collection and preprocessing to model architecture selection, pre-training strategies, and model deployment, we provide a detailed analysis of the key techniques and methodologies at each stage. In particular, considering the diversity, complexity, and physical consistency constraints of geophysical data, we discuss targeted solutions to address these challenges. Furthermore, we discuss how to leverage the transfer learning capabilities of foundation models to reduce reliance on labeled data, enhance computational efficiency, and incorporate physical constraints into model training, thereby improving physical consistency and interpretability. Through a comprehensive summary and analysis of the current technological landscape, this paper not only fills the gap in the geophysics domain regarding a full-process review of foundation models but also offers valuable practical guidance for their application in geophysical data analysis, driving innovation and advancement in the field.
