Table of Contents
Fetching ...

Synthetic Geology: Structural Geology Meets Deep Learning

Simon Ghyselincks, Valeriia Okhmak, Stefano Zampini, George Turkiyyah, David Keyes, Eldad Haber

TL;DR

This work presents StructuralGeo, a stochastic geology simulator used to generate large-scale synthetic 3D lithology datasets and train a 3D attention flow-matching model to reconstruct multiple plausible subsurface scenarios from surface and borehole data. By embedding discrete lithology labels into a continuous latent space, the model learns a velocity field that transports samples from a simple prior pi_0 to a target distribution pi_m, enabling both unconditional generation and conditional reconstruction pi_{\mathbf m|\mathbf d}. The results demonstrate diverse unconditional geologies and conditional realizations that honor sparse observations, offering a probabilistic framework for inverse problems, uncertainty quantification, and integration with geophysical workflows. This approach provides a scalable, data-efficient prior for structural geology, with open-source code to enable community use, extension, and application to resource exploration and geohazard assessment.

Abstract

Reconstructing the structural geology and mineral composition of the first few kilometers of the Earth's subsurface from sparse or indirect surface observations remains a long-standing challenge with critical applications in mineral exploration, geohazard assessment, and geotechnical engineering. This inherently ill-posed problem is often addressed by classical geophysical inversion methods, which typically yield a single maximum-likelihood model that fails to capture the full range of plausible geology. The adoption of modern deep learning methods has been limited by the lack of large 3D training datasets. We address this gap with \textit{StructuralGeo}, a geological simulation engine that mimics eons of tectonic, magmatic, and sedimentary processes to generate a virtually limitless supply of realistic synthetic 3D lithological models. Using this dataset, we train both unconditional and conditional generative flow-matching models with a 3D attention U-net architecture. The resulting foundation model can reconstruct multiple plausible 3D scenarios from surface topography and sparse borehole data, depicting structures such as layers, faults, folds, and dikes. By sampling many reconstructions from the same observations, we introduce a probabilistic framework for estimating the size and extent of subsurface features. While the realism of the output is bounded by the fidelity of the training data to true geology, this combination of simulation and generative AI functions offers a flexible prior for probabilistic modeling, regional fine-tuning, and use as an AI-based regularizer in traditional geophysical inversion workflows.

Synthetic Geology: Structural Geology Meets Deep Learning

TL;DR

This work presents StructuralGeo, a stochastic geology simulator used to generate large-scale synthetic 3D lithology datasets and train a 3D attention flow-matching model to reconstruct multiple plausible subsurface scenarios from surface and borehole data. By embedding discrete lithology labels into a continuous latent space, the model learns a velocity field that transports samples from a simple prior pi_0 to a target distribution pi_m, enabling both unconditional generation and conditional reconstruction pi_{\mathbf m|\mathbf d}. The results demonstrate diverse unconditional geologies and conditional realizations that honor sparse observations, offering a probabilistic framework for inverse problems, uncertainty quantification, and integration with geophysical workflows. This approach provides a scalable, data-efficient prior for structural geology, with open-source code to enable community use, extension, and application to resource exploration and geohazard assessment.

Abstract

Reconstructing the structural geology and mineral composition of the first few kilometers of the Earth's subsurface from sparse or indirect surface observations remains a long-standing challenge with critical applications in mineral exploration, geohazard assessment, and geotechnical engineering. This inherently ill-posed problem is often addressed by classical geophysical inversion methods, which typically yield a single maximum-likelihood model that fails to capture the full range of plausible geology. The adoption of modern deep learning methods has been limited by the lack of large 3D training datasets. We address this gap with \textit{StructuralGeo}, a geological simulation engine that mimics eons of tectonic, magmatic, and sedimentary processes to generate a virtually limitless supply of realistic synthetic 3D lithological models. Using this dataset, we train both unconditional and conditional generative flow-matching models with a 3D attention U-net architecture. The resulting foundation model can reconstruct multiple plausible 3D scenarios from surface topography and sparse borehole data, depicting structures such as layers, faults, folds, and dikes. By sampling many reconstructions from the same observations, we introduce a probabilistic framework for estimating the size and extent of subsurface features. While the realism of the output is bounded by the fidelity of the training data to true geology, this combination of simulation and generative AI functions offers a flexible prior for probabilistic modeling, regional fine-tuning, and use as an AI-based regularizer in traditional geophysical inversion workflows.

Paper Structure

This paper contains 23 sections, 18 equations, 9 figures.

Figures (9)

  • Figure 1: Example of geological history illustrating the application of transformations and depositions. Panels: (A) initial strata ${\bf m}_0({\bf x})$, (B) fold $\mathcal{T}_{\text{fold}}$, (C) dike $\mathcal{D}_{\text{dike}}$, and (D) fault $\mathcal{T}_{\text{fault}}$. Processes $\mathcal{P}_i$ are sequenced randomly from a set ${\@fontswitch{}{\mathcal{}} P}_j, j=1,\ldots,k$ using a Markov chain, leading to varied outcomes.
  • Figure 2: Pipeline for random geological model generation. A batch request triggers a Markov chain sampler which selects and sequences geological processes (e.g., folding, deposition) from a transition matrix; parameters such as amplitude and wavelength are drawn from tuned random variables; the sequences are applied to produce batches of 3D tensors.
  • Figure 3: Overview of our conditional flow-matching architecture. The diagram is inspired by the Latent Diffusion Models paper (Rombach et al., 2022).
  • Figure 4: Three examples of unconditional earth models. Panels (a–c) $64 \times 64 \times 64$ resolution generative samples UserColor from $\pi_{{\bf m}}$ using flow matching. Each model represents a $3.84\, \text{km} \, \times 3.84\, \text{km} \, \times 3.84\, \text{km}$ volume of the Earth's crust. UserColor Section cut-aways into the generated model are added to show portions of the interior. [R1-C13]
  • Figure 5: (a) A 3D Geomodel unseen during training is used to produce sparse conditional data for reconstruction in (b) and (c) consisting of 25 randomly placed vertical boreholes one single $(60 \times 60 \times 60)\text{m}^3$ voxel in diameter.
  • ...and 4 more figures