Table of Contents
Fetching ...

Foundation Models For Seismic Data Processing: An Extensive Review

Fabian Fuchs, Mario Ruben Fernandez, Norman Ettrich, Janis Keuper

TL;DR

This paper evaluates how natural-image foundation models can be repurposed for seismic processing tasks—demultiple, interpolation, and denoising—within an encoder–decoder framework. By benchmarking a broad mix of hierarchical, non-hierarchical, transformer-, convolutional-, and hybrid-based FMs across pretraining strategies (primarily self-supervised) and downstream training strategies, it reveals that hierarchical models and self-supervised pretraining generally boost performance, with Swin and ConvNeXt emerging as strong performers. It also shows that natural-image pretraining provides robust transfer, though the advantages are modulated by dataset size, model architecture, and the amount of task-specific data available. The study offers a practical framework, open datasets, and code to facilitate reproducibility and future seismic foundation-model research, highlighting directions such as seismic-specific pretraining and broader generalization assessments.

Abstract

Seismic processing plays a crucial role in transforming raw data into high-quality subsurface images, pivotal for various geoscience applications. Despite its importance, traditional seismic processing techniques face challenges such as noisy and damaged data and the reliance on manual, time-consuming workflows. The emergence of deep learning approaches has introduced effective and user-friendly alternatives, yet many of these deep learning approaches rely on synthetic datasets and specialized neural networks. Recently, foundation models have gained traction in the seismic domain, due to their success in the natural image domain. Therefore, we investigate the application of natural image foundation models on the three seismic processing tasks: demultiple, interpolation, and denoising. We evaluate the impact of different model characteristics, such as pre-training technique and neural network architecture, on performance and efficiency. Rather than proposing a single seismic foundation model, we critically examine various natural image foundation models and suggest some promising candidates for future exploration.

Foundation Models For Seismic Data Processing: An Extensive Review

TL;DR

This paper evaluates how natural-image foundation models can be repurposed for seismic processing tasks—demultiple, interpolation, and denoising—within an encoder–decoder framework. By benchmarking a broad mix of hierarchical, non-hierarchical, transformer-, convolutional-, and hybrid-based FMs across pretraining strategies (primarily self-supervised) and downstream training strategies, it reveals that hierarchical models and self-supervised pretraining generally boost performance, with Swin and ConvNeXt emerging as strong performers. It also shows that natural-image pretraining provides robust transfer, though the advantages are modulated by dataset size, model architecture, and the amount of task-specific data available. The study offers a practical framework, open datasets, and code to facilitate reproducibility and future seismic foundation-model research, highlighting directions such as seismic-specific pretraining and broader generalization assessments.

Abstract

Seismic processing plays a crucial role in transforming raw data into high-quality subsurface images, pivotal for various geoscience applications. Despite its importance, traditional seismic processing techniques face challenges such as noisy and damaged data and the reliance on manual, time-consuming workflows. The emergence of deep learning approaches has introduced effective and user-friendly alternatives, yet many of these deep learning approaches rely on synthetic datasets and specialized neural networks. Recently, foundation models have gained traction in the seismic domain, due to their success in the natural image domain. Therefore, we investigate the application of natural image foundation models on the three seismic processing tasks: demultiple, interpolation, and denoising. We evaluate the impact of different model characteristics, such as pre-training technique and neural network architecture, on performance and efficiency. Rather than proposing a single seismic foundation model, we critically examine various natural image foundation models and suggest some promising candidates for future exploration.

Paper Structure

This paper contains 25 sections, 8 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Comparison between specialized neural networks and fm. Specialized neural networks, like UNets, are trained end to end for each downstream task. In comparison, an fm is pre-trained once and then fine-tuned for each downstream task. The pre-training is done though self-supervised learning on real field data and the fine-tuning through supervised learning on synthetic data. The parts of the model denoted as D are task dependent decoders.
  • Figure 2: Depiction of the architecture of the encoder-decoder model with an arbitrary fm, that produces a four-stage feature map, as the image encoder and a UNet-style decoder. Also represented are both hierarchical and non-hierarchical features. Additionally, the components of the encoder-decoder network that are affected by pre-training and the components that are affected by the different downstream training strategies are labeled.
  • Figure 3: Comparison of the different downstream training strategies. Frozen encoder freezes a natural image pre-trained fm and only trains the decoder. Fine-tuned encoder fine-tunes a natural image pre-trained fm and trains the decoder. Non-pre-trained encoder trains the whole encoder-decoder model from scratch. While all three downstream training strategies use supervised training, the pre-training method differs. Which specific pre-training method is used, is noted in Table \ref{['tab:models']}. Additionally, a short introduction about each observed pre-training method can be found in Section \ref{['sec:training_strategies']}.
  • Figure 4: Comparison of the combined performance of the different fm contrasted against their number of parameters. The left plot shows all the tested fm and the right plot only the fm with a combined ssim above $2.5$ to further highlight the difference between the well performing models. The well performing models are swin, ConvNeXt, Hiera, CAFormer and MambaOut. While most of the well performing models have a similar number of parameters Hiera is significantly smaller.
  • Figure 5: Comparison of the combined performance of the different fm contrasted against the size of the pre-training dataset. The left plot shows all the tested fm and the right one only the fm with a combined ssim above $2.5$ to further highlight the difference between the well performing models. Additionally, the right plot uses a log scaling for the x-axis to demonstrate the differences between the smaller datasets, which vanish with the linear scale. The well performing models are swin, ConvNeXt, Hiera, CAFormer and MambaOut. Of these Hiera and MambaOut were trained on a significantly larger dataset and ConvNeXt on the smallest.
  • ...and 5 more figures