Table of Contents
Fetching ...

A foundation model enpowered by a multi-modal prompt engine for universal seismic geobody interpretation across surveys

Hang Gao, Xinming Wu, Luming Liang, Hanlin Sheng, Xu Si, Gao Hui, Yaxing Li

TL;DR

This model is the first highly scalable and versatile multi-modal foundation model capable of interpreting any geobodies across surveys while supporting real-time interactions and establishes a new paradigm for geoscientific data interpretation, with broad potential for transfer to other tasks.

Abstract

Seismic geobody interpretation is crucial for structural geology studies and various engineering applications. Existing deep learning methods show promise but lack support for multi-modal inputs and struggle to generalize to different geobody types or surveys. We introduce a promptable foundation model for interpreting any geobodies across seismic surveys. This model integrates a pre-trained vision foundation model (VFM) with a sophisticated multi-modal prompt engine. The VFM, pre-trained on massive natural images and fine-tuned on seismic data, provides robust feature extraction for cross-survey generalization. The prompt engine incorporates multi-modal prior information to iteratively refine geobody delineation. Extensive experiments demonstrate the model's superior accuracy, scalability from 2D to 3D, and generalizability to various geobody types, including those unseen during training. To our knowledge, this is the first highly scalable and versatile multi-modal foundation model capable of interpreting any geobodies across surveys while supporting real-time interactions. Our approach establishes a new paradigm for geoscientific data interpretation, with broad potential for transfer to other tasks.

A foundation model enpowered by a multi-modal prompt engine for universal seismic geobody interpretation across surveys

TL;DR

This model is the first highly scalable and versatile multi-modal foundation model capable of interpreting any geobodies across surveys while supporting real-time interactions and establishes a new paradigm for geoscientific data interpretation, with broad potential for transfer to other tasks.

Abstract

Seismic geobody interpretation is crucial for structural geology studies and various engineering applications. Existing deep learning methods show promise but lack support for multi-modal inputs and struggle to generalize to different geobody types or surveys. We introduce a promptable foundation model for interpreting any geobodies across seismic surveys. This model integrates a pre-trained vision foundation model (VFM) with a sophisticated multi-modal prompt engine. The VFM, pre-trained on massive natural images and fine-tuned on seismic data, provides robust feature extraction for cross-survey generalization. The prompt engine incorporates multi-modal prior information to iteratively refine geobody delineation. Extensive experiments demonstrate the model's superior accuracy, scalability from 2D to 3D, and generalizability to various geobody types, including those unseen during training. To our knowledge, this is the first highly scalable and versatile multi-modal foundation model capable of interpreting any geobodies across surveys while supporting real-time interactions. Our approach establishes a new paradigm for geoscientific data interpretation, with broad potential for transfer to other tasks.
Paper Structure (16 sections, 4 equations, 11 figures, 2 tables)

This paper contains 16 sections, 4 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Workflow of developing our model of segment any geobodies (SAG) across surveys. We start by collecting diverse seismic images and labeling various geobodies to construct a multi-geobody dataset (first column). We further integrate a pre-trained foundation model with a well-designed prompting engine and decoder (second column), which are fine-tuned and trained on the multi-geobody dataset. This approach results in a model that can achieve real-time, interactive segmentation of any geobodies across various seismic surveys (third column). Moreover, this model, without retraining, can be directly extended to interpret 3D geobodies and other geobody types unseen in the training dataset.
  • Figure 2: Network architecture, training strategy and interpretability of image encoder. a The SAG model includes an image encoder for characterizing seismic data, a prompt encoder for encoding diverse prompts, and a mask decoder for generating masks. During training, the pre-trained image encoder remains frozen, while its associated LoRA modules, prompt encoder, and mask decoder are trained. b LoRA module for fine-tuning the pre-trained model. LoRA adapters are inserted into each attention module within the image encoder, allowing fine-tuning of its parameters. c Point, box and well logging prompts inputted into prompt encoder d Visual interpretability analysis to the hidden representations of seismic features by the pre-trained model (SAM) and the fine-tuned model (SAG). Seismic images containing various geobodies (first row) are input into SAM and SAG to visualize their respective feature maps (second and third rows).
  • Figure 3: Visual interpretability analysis to the performance of SAG and its prompt engine. a t-SNE dimensionality reduction of hidden representation from SAM and SAG. t-SNE analysis is a dimensionality reduction technique primarily used for visualizing high-dimensional data. b Impact of different prompts on features space of mask decoder. Iterative updating of prompts helps the decoder to obtain more accurate hidden representations of target geobody. c Workflow of interpreting multi-type and multi-instance geobodies in a field seismic image with a single model of SAG.
  • Figure 4: Quantitative and comparative evaluation on test datasets. a Visualization of multi-geobody predictions by different models on the test set. b Evaluation metrics on various geobody interpretation by different models. c Limitations of CNN-based methods for strata interpretation across different seismic data. The CNN-based approaches require training a specialized model for each seismic survey to interpret its strata, whereas a single model of SAG can achieve strata segmentation across various surveys. d Comparison of model performance variation with distance from the trainset. The difference between test samples and trainset increases with distance.
  • Figure 5: Application of SAG model to interpret 3D geobodies and other geobody types unseen in training dataset. a Implementation of 3D geobody interpretation by sequential 2D predictions and recursively use previous predictions to automatically generate prompts for guiding the next predictions to maintain consistency. b Visualization of 3D Channel likelihood volume predicted by SAG. c Modeling of a 3D deep-water channel system from the likelihood volume. d 3D Interpretation of instant paleokarsts, salt body, fault damage zone, volcanic complex and overlying channel system.
  • ...and 6 more figures