Well log data generation and imputation using sequence-based generative adversarial networks
Abdulrahman Al-Fakih, A. Koeshidayatullah, Tapan Mukerji, Sadam Al-Azani, SanLinn I. Kaka
TL;DR
This work tackles the challenge of gaps and uncertainties in well log data by proposing a dual-GAN framework that combines Time Series GAN ($TSGAN$) for synthetic data generation and Sequence GAN ($SeqGAN$) for imputation of missing values. The approach is evaluated on North Sea LAS datasets, with comparisons to BRITS and NAOMI showing strong performance in both data synthesis ($R^2$ ≈ $0.92$) and sequential imputation. The study provides rigorous statistical and visual validation, including KS tests, Pearson correlations, KL divergences, and PCA/t-SNE visualizations, to demonstrate fidelity between real and synthetic data. Overall, the dual-framework enhances data completeness and reliability in geosciences, offering a practical pathway to improved reservoir characterization in data-sparse subsurface environments.
Abstract
Well log analysis is crucial for hydrocarbon exploration, providing detailed insights into subsurface geological formations. However, gaps and inaccuracies in well log data, often due to equipment limitations, operational challenges, and harsh subsurface conditions, can introduce significant uncertainties in reservoir evaluation. Addressing these challenges requires effective methods for both synthetic data generation and precise imputation of missing data, ensuring data completeness and reliability. This study introduces a novel framework utilizing sequence-based generative adversarial networks (GANs) specifically designed for well log data generation and imputation. The framework integrates two distinct sequence-based GAN models: Time Series GAN (TSGAN) for generating synthetic well log data and Sequence GAN (SeqGAN) for imputing missing data. Both models were tested on a dataset from the North Sea, Netherlands region, focusing on different sections of 5, 10, and 50 data points. Experimental results demonstrate that this approach achieves superior accuracy in filling data gaps compared to other deep learning models for spatial series analysis. The method yielded R^2 values of 0.921, 0.899, and 0.594, with corresponding mean absolute percentage error (MAPE) values of 8.320, 0.005, and 151.154, and mean absolute error (MAE) values of 0.012, 0.005, and 0.032, respectively. These results set a new benchmark for data integrity and utility in geosciences, particularly in well log data analysis.
