Benchmark Dataset for Pore-Scale CO2-Water Interaction
Alhasan Abdellatif, Hannah P. Menke, Julien Maes, Ahmed H. Elsheikh, Florian Doster
TL;DR
The paper addresses benchmarking ML surrogates for pore-scale CO2–water displacement in heterogeneous porous media by providing a high-resolution, time-resolved dataset. It introduces 624 2D samples, each $512\times512$ at a resolution of $35 \mu$m, capturing 100 time steps under a constant CO2 injection rate of $1\times10^{-8}$ m^3/s, with five heterogeneity levels realized via grain-size perturbations. Outputs per sample include $\alpha_{water}$, $p$, $pc$, $U_x$, $U_y$, and a binary domain, plus datasets of porosity, permeability, and relative permeability, stored in HDF5 and complemented by CSVs. A U-Net–based autoregressive forecasting experiment demonstrates that training on more diverse, multi-level heterogeneity improves average generalization on unseen level-5 domains, with the 5-Level model achieving the lowest $MSE$, though some samples exhibit biases.
Abstract
Accurately capturing the complex interaction between CO2 and water in porous media at the pore scale is essential for various geoscience applications, including carbon capture and storage (CCS). We introduce a comprehensive dataset generated from high-fidelity numerical simulations to capture the intricate interaction between CO2 and water at the pore scale. The dataset consists of 624 2D samples, each of size 512x512 with a resolution of 35 μm, covering 100 time steps under a constant CO2 injection rate. It includes various levels of heterogeneity, represented by different grain sizes with random variation in spacing, offering a robust testbed for developing predictive models. This dataset provides high-resolution temporal and spatial information crucial for benchmarking machine learning models.
