Synthetic Geology: Structural Geology Meets Deep Learning
Simon Ghyselincks, Valeriia Okhmak, Stefano Zampini, George Turkiyyah, David Keyes, Eldad Haber
TL;DR
This work presents StructuralGeo, a stochastic geology simulator used to generate large-scale synthetic 3D lithology datasets and train a 3D attention flow-matching model to reconstruct multiple plausible subsurface scenarios from surface and borehole data. By embedding discrete lithology labels into a continuous latent space, the model learns a velocity field that transports samples from a simple prior pi_0 to a target distribution pi_m, enabling both unconditional generation and conditional reconstruction pi_{\mathbf m|\mathbf d}. The results demonstrate diverse unconditional geologies and conditional realizations that honor sparse observations, offering a probabilistic framework for inverse problems, uncertainty quantification, and integration with geophysical workflows. This approach provides a scalable, data-efficient prior for structural geology, with open-source code to enable community use, extension, and application to resource exploration and geohazard assessment.
Abstract
Reconstructing the structural geology and mineral composition of the first few kilometers of the Earth's subsurface from sparse or indirect surface observations remains a long-standing challenge with critical applications in mineral exploration, geohazard assessment, and geotechnical engineering. This inherently ill-posed problem is often addressed by classical geophysical inversion methods, which typically yield a single maximum-likelihood model that fails to capture the full range of plausible geology. The adoption of modern deep learning methods has been limited by the lack of large 3D training datasets. We address this gap with \textit{StructuralGeo}, a geological simulation engine that mimics eons of tectonic, magmatic, and sedimentary processes to generate a virtually limitless supply of realistic synthetic 3D lithological models. Using this dataset, we train both unconditional and conditional generative flow-matching models with a 3D attention U-net architecture. The resulting foundation model can reconstruct multiple plausible 3D scenarios from surface topography and sparse borehole data, depicting structures such as layers, faults, folds, and dikes. By sampling many reconstructions from the same observations, we introduce a probabilistic framework for estimating the size and extent of subsurface features. While the realism of the output is bounded by the fidelity of the training data to true geology, this combination of simulation and generative AI functions offers a flexible prior for probabilistic modeling, regional fine-tuning, and use as an AI-based regularizer in traditional geophysical inversion workflows.
