Table of Contents
Fetching ...

PyAWD: A Library for Generating Large Synthetic Datasets of Acoustic Wave Propagation

Pascal Tribel, Gianluca Bontempi

TL;DR

PyAWD tackles data sparsity in seismic ML by providing a Python library that generates large, high-resolution synthetic datasets of spatio-temporal acoustic wave propagation in 2D and 3D heterogeneous media. It solves the anisotropic nondispersive Acoustic Wave Equation $ \frac{d^2u}{dt^2} = c\nabla^2 u - \alpha \frac{du}{dt} + f $ via Devito, offering PyTorch-compatible datasets with on-the-fly generation and interrogator probes for ML pipelines. The authors demonstrate utility with a 2D epicenter retrieval task in a Marmousi field, evaluating several ML models and performing data-budgeting analyses to reveal data requirements and model robustness; TCNN and Extra Trees emerge as top performers. Overall, PyAWD provides a practical path to generate rich, ML-ready seismic data, enabling exploration of model selection, data budgeting, and transfer learning, while future work will integrate real data and extend to more complex wave equations and source models.

Abstract

Seismic data is often sparse and unevenly distributed due to the high costs and logistical challenges associated with deploying physical seismometers, limiting the application of Machine Learning (ML) in earthquake analysis. While simulation methods exist, no tool allows the generation of large datasets containing simulated measurements of the ground motion. To address this gap, we introduce PyAWD, a Python library designed to generate high-resolution synthetic datasets simulating spatio-temporal acoustic wave propagation in both two-dimensional and three-dimensional heterogeneous media. By allowing fine control over parameters such as the wave speed, external forces, spatial and temporal discretization, and media composition, PyAWD enables the creation of ML-scale datasets that capture the complexity of seismic wave behavior. We illustrate the library's potential with an epicenter retrieval task, showcasing its suitability for designing complex, accurate seismic problems that require advanced ML approaches in the absence or lack of dense real-world data. We also show the usefulness of our tool to tackle the problem of data budgeting in the framework of epicenter retrieval.

PyAWD: A Library for Generating Large Synthetic Datasets of Acoustic Wave Propagation

TL;DR

PyAWD tackles data sparsity in seismic ML by providing a Python library that generates large, high-resolution synthetic datasets of spatio-temporal acoustic wave propagation in 2D and 3D heterogeneous media. It solves the anisotropic nondispersive Acoustic Wave Equation via Devito, offering PyTorch-compatible datasets with on-the-fly generation and interrogator probes for ML pipelines. The authors demonstrate utility with a 2D epicenter retrieval task in a Marmousi field, evaluating several ML models and performing data-budgeting analyses to reveal data requirements and model robustness; TCNN and Extra Trees emerge as top performers. Overall, PyAWD provides a practical path to generate rich, ML-ready seismic data, enabling exploration of model selection, data budgeting, and transfer learning, while future work will integrate real data and extend to more complex wave equations and source models.

Abstract

Seismic data is often sparse and unevenly distributed due to the high costs and logistical challenges associated with deploying physical seismometers, limiting the application of Machine Learning (ML) in earthquake analysis. While simulation methods exist, no tool allows the generation of large datasets containing simulated measurements of the ground motion. To address this gap, we introduce PyAWD, a Python library designed to generate high-resolution synthetic datasets simulating spatio-temporal acoustic wave propagation in both two-dimensional and three-dimensional heterogeneous media. By allowing fine control over parameters such as the wave speed, external forces, spatial and temporal discretization, and media composition, PyAWD enables the creation of ML-scale datasets that capture the complexity of seismic wave behavior. We illustrate the library's potential with an epicenter retrieval task, showcasing its suitability for designing complex, accurate seismic problems that require advanced ML approaches in the absence or lack of dense real-world data. We also show the usefulness of our tool to tackle the problem of data budgeting in the framework of epicenter retrieval.

Paper Structure

This paper contains 8 sections, 4 equations, 9 figures.

Figures (9)

  • Figure 1: Example of scalar wave propagation simulation generated by PyAWD.
  • Figure 2: The Marmousi field, from (marmousi). The darker the color, the lower the propagation speed. This complex structure is used as a preset example in PyAWD.
  • Figure 3: Histogram of both $x$ and $y$ coordinates of the epicenters in the training set.
  • Figure 4: Example of propagating wave in the Marmousi field, with two interrogators.
  • Figure 5: Example of interrogators response, which follows the simulation shown in figure \ref{['fig:train_example']}
  • ...and 4 more figures