Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis

Zhixiang Guo; Xinming Wu; Luming Liang; Hanlin Sheng; Nuo Chen; Zhengfa Bi

Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis

Zhixiang Guo, Xinming Wu, Luming Liang, Hanlin Sheng, Nuo Chen, Zhengfa Bi

TL;DR

This work investigates cross-domain adaptation of vision foundation models to geophysical data analysis, addressing data scarcity and high computational costs. It uses DINOv2 as a feature encoder, finetuned with a parameter-efficient method (LoRA) and paired with simple to complex decoders, enabling effective geophysical segmentation across lunar craters, DAS seismic events, seismic facies, salt geobodies, and deep faults. Across five tasks, the adapted model outperforms a Unet baseline in $mIoU$ and $mPA$, often with minimal decoder complexity, and demonstrates robust generalization with adaptation times far shorter than training a foundation model from scratch. The results support the feasibility and practical benefits of cross-domain FM adaptation for geophysics and potentially other scientific domains, reducing data and compute barriers for advanced analyses.

Abstract

We explore adapting foundation models (FMs) from the computer vision domain to geoscience. FMs, large neural networks trained on massive datasets, excel in diverse tasks with remarkable adaptability and generality. However, geoscience faces challenges like lacking curated training datasets and high computational costs for developing specialized FMs. This study considers adapting FMs from computer vision to geoscience, analyzing their scale, adaptability, and generality for geoscientific data analysis. We introduce a workflow that leverages existing computer vision FMs, fine-tuning them for geoscientific tasks, reducing development costs while enhancing accuracy. Through experiments, we demonstrate this workflow's effectiveness in broad applications to process and interpret geoscientific data of lunar images, seismic data, DAS arrays and so on. Our findings introduce advanced ML techniques to geoscience, proving the feasibility and advantages of cross-domain FMs adaptation, driving further advancements in geoscientific data analysis and offering valuable insights for FMs applications in other scientific domains.

Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis

TL;DR

and

, often with minimal decoder complexity, and demonstrates robust generalization with adaptation times far shorter than training a foundation model from scratch. The results support the feasibility and practical benefits of cross-domain FM adaptation for geophysics and potentially other scientific domains, reducing data and compute barriers for advanced analyses.

Abstract

Paper Structure (13 sections, 1 equation, 10 figures, 5 tables)

This paper contains 13 sections, 1 equation, 10 figures, 5 tables.

Results
Choice of pre-trained FM
Pre-trained FM for geophysical data feature representation
Efficient and generalized adaptation of FM to geophysics
Performance of adapted FM in geophysical downstream tasks
Dissusion
Methods
Adaptation datasets preparation
LoRA layers for fine-tune
Decoding Module
Weighted dice loss
Data availability
Code availability

Figures (10)

Figure 1: Workflow for adapting pre-trained foundation models to geophysics. First, we prepare geophysical training datasets (1st column), which involves collecting and processing relevant geophysical data to ensure it is suitable for adaption fine-tuning. Next, we load the pre-trained foundation model as the data feature encoder (2nd column) and fine-tune the model to make it adaptable to geophysical data. To map the encoder features to the task-specific targets, we explore suitable decoders (3rd column) for geophysical downstream adaption. Finally, the adapted model is applied to various downstream tasks within the geophysics field (4th column).
Figure 2: DINOv2's feature representation of geophysical data. The 1st column shows typical geophysical data, including, from top to bottom, lunar images containing craters, DAS data with seismic events, seismic data with salt domes, strata facies, and deep faults. We input these data into the pre-trained DINOv2, which serves as an encoder to compute the feature representation of the data. The RGB visualization shows the three most representative components of the geophysical data feature representation by the pre-trained DINOv2 before (2nd column) and after (3rd column) fine-tuning. We observe that DINOv2, initially pre-trained on natural images, exhibits a general capability for representing geophysical data features, forming a basis for its adaptation to geophysical tasks. Fine-tuning further enhances this feature representation (3rd column), ensuring advanced performance in geophysical applications.
Figure 3: Network architecture of adapting foundation models. We designed the adaptation network by feeding the three-channel data into the pre-trained foundation model with a ViT architecture. We employed LoRA layers to efffiently fine-tune the pre-trained ViT and enhance its feature representation of geophysical data. We also explored three different types of decoders (PUP, MLA, and DPT) for mapping the ViT features, specifically the features from the 3rd, 6th, 9th, and 12th layers, into the task-specific targets or outputs. This adaptation scheme, involving fine-tuning LoRA layers and custom decoders, enables the development of broad geophysical applications using a pre-trained vision foundation model.
Figure 4: Application of the adapted DINOv2 to various geophysical downstream tasks. Each column represents a specific downstream task, including crater detection in lunar images, seismic event detection in DAS data, seismic facies classification, salt dome geobody detection, and deep fault detection in seismic data. From top to bottom, the rows correspond to the input geophysical data, results from Unet, and the adapted DINOv2 encoder with a Linear layer, PUP decoder, DPT decoder, and MLA decoder, along with the corresponding labels. It is evident that DINOv2, when paired with different decoders, achieves good results across all tasks.
Figure 5: Performance metrics on test datasets across all tasks. For seismic facies classification, the mIoU of DINOv2 shows a significantly smaller reduction compared to Unet as the distance between the test and training data increases, indicating that DINOv2 has far superior generalization across diverse data compared to Unet. Additionally, we calculated and plotted the mIoU distribution for all tasks on the test datasets, further highlighting DINOv2's outstanding performance across various tasks. Finally, we present the overall mIoU and mPA results, showcasing the comprehensive effectiveness of the adapted DINOv2 model.
...and 5 more figures

Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis

TL;DR

Abstract

Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (10)