Table of Contents
Fetching ...

A generative foundation model for an all-in-one seismic processing framework

Shijun Cheng, Randy Harsuko, Tariq Alkhalifah

TL;DR

The paper introduces GSFM, a generative diffusion-based framework that learns the joint distribution of clean, complete, broadband seismic data to tackle multi-task seismic processing (denoising, backscattered noise attenuation, interpolation, and low-frequency extrapolation). It combines synthetic-data pre-training with iterative SSL-based fine-tuning on field data and uses a target-oriented x0-prediction to achieve efficient, high-quality predictions in a single sampling step. Key contributions include multi-task encoding via class labels, an enhanced U-Net architecture with time and task embeddings, and an uncertainty-quantification mechanism inherent to diffusion models to gauge processing reliability. The results demonstrate competitive synthetic performance and superior field-data generalization, with an uncertainty-guided fine-tuning strategy that offers practical benefits for real-world seismic workflows.

Abstract

Seismic data often face challenges in their utilization due to noise contamination, incomplete acquisition, and limited low-frequency information, which hinder accurate subsurface imaging and interpretation. Traditional processing methods rely heavily on task-specific designs to address these challenges and fail to account for the variability of data. To address these limitations, we present a generative seismic foundation model (GSFM), a unified framework based on generative diffusion models (GDMs), designed to tackle multi-task seismic processing challenges, including denoising, backscattered noise attenuation, interpolation, and low-frequency extrapolation. GSFM leverages a pre-training stage on synthetic data to capture the features of clean, complete, and broadband seismic data distributions and applies an iterative fine-tuning strategy to adapt the model to field data. By adopting a target-oriented diffusion process prediction, GSFM improves computational efficiency without compromising accuracy. Synthetic data tests demonstrate GSFM surpasses benchmarks with equivalent architectures in all tasks and achieves performance comparable to traditional pre-training strategies, even after their fine-tuning. Also, field data tests suggest that our iterative fine-tuning approach addresses the generalization limitations of conventional pre-training and fine-tuning paradigms, delivering significantly enhanced performance across diverse tasks. Furthermore, GSFM's inherent probabilistic nature enables effective uncertainty quantification, offering valuable insights into the reliability of processing results.

A generative foundation model for an all-in-one seismic processing framework

TL;DR

The paper introduces GSFM, a generative diffusion-based framework that learns the joint distribution of clean, complete, broadband seismic data to tackle multi-task seismic processing (denoising, backscattered noise attenuation, interpolation, and low-frequency extrapolation). It combines synthetic-data pre-training with iterative SSL-based fine-tuning on field data and uses a target-oriented x0-prediction to achieve efficient, high-quality predictions in a single sampling step. Key contributions include multi-task encoding via class labels, an enhanced U-Net architecture with time and task embeddings, and an uncertainty-quantification mechanism inherent to diffusion models to gauge processing reliability. The results demonstrate competitive synthetic performance and superior field-data generalization, with an uncertainty-guided fine-tuning strategy that offers practical benefits for real-world seismic workflows.

Abstract

Seismic data often face challenges in their utilization due to noise contamination, incomplete acquisition, and limited low-frequency information, which hinder accurate subsurface imaging and interpretation. Traditional processing methods rely heavily on task-specific designs to address these challenges and fail to account for the variability of data. To address these limitations, we present a generative seismic foundation model (GSFM), a unified framework based on generative diffusion models (GDMs), designed to tackle multi-task seismic processing challenges, including denoising, backscattered noise attenuation, interpolation, and low-frequency extrapolation. GSFM leverages a pre-training stage on synthetic data to capture the features of clean, complete, and broadband seismic data distributions and applies an iterative fine-tuning strategy to adapt the model to field data. By adopting a target-oriented diffusion process prediction, GSFM improves computational efficiency without compromising accuracy. Synthetic data tests demonstrate GSFM surpasses benchmarks with equivalent architectures in all tasks and achieves performance comparable to traditional pre-training strategies, even after their fine-tuning. Also, field data tests suggest that our iterative fine-tuning approach addresses the generalization limitations of conventional pre-training and fine-tuning paradigms, delivering significantly enhanced performance across diverse tasks. Furthermore, GSFM's inherent probabilistic nature enables effective uncertainty quantification, offering valuable insights into the reliability of processing results.

Paper Structure

This paper contains 27 sections, 14 equations, 17 figures, 6 tables, 1 algorithm.

Figures (17)

  • Figure 1: An illustration of our network architecture. (a) The overall network structure. (b) Time embedding layer. (c) Class embedding layer. (e) The residual block. (d) The attention block.
  • Figure 2: Denoising performance comparison between our pre-trained DSFM and two benchmarks on synthetic data. (a) The clean and (b) noisy data, where the noisy data is created by injecting random noise with a level of 30% into the clean data. The denoised products from (c) our GSFM, (d) Benchmark 1, and (e) Benchmark 2. f, g, and h are the corresponding difference between the denoised results and the clean data.
  • Figure 3: Backscattered noise attenuation performance comparison between our pre-trained DSFM and two benchmarks on synthetic data. (a) The clean and (b) noisy data contaminated with backscattered noise. The denoised products from (c) our GSFM, (d) Benchmark 1, and (e) Benchmark 2. f, g, and h are the corresponding difference between the denoised results and the clean data.
  • Figure 4: Interpolation performance comparison between our pre-trained DSFM and two benchmarks on synthetic data. (a) The complete (label) and (b) incomplete data, where the incomplete data is created by randomly removing 50% of traces from the complete data. The interpolated products from (c) our GSFM, (d) Benchmark 1, and (e) Benchmark 2. f, g, and h are the corresponding difference between the interpolated results and the labeled data.
  • Figure 5: Low-frequency extrapolation performance comparison between our pre-trained DSFM and two benchmarks on synthetic data. (a) The labeled and (b) input data, where the input data lacks low frequencies below 4 Hz. The extrapolated products from (c) our GSFM, (d) Benchmark 1, and (e) Benchmark 2. f, g, and h are the corresponding difference between the extrapolated results and the labeled data.
  • ...and 12 more figures