Integrating Score-Based Diffusion Models with Machine Learning-Enhanced Localization for Advanced Data Assimilation in Geological Carbon Storage
Gabriel Serrão Seabra, Nikolaj T. Mücke, Vinicius Luiz Santos Silva, Alexandre A. Emerick, Denis Voskov, Femke Vossepoel
TL;DR
This work tackles uncertainty quantification for geological carbon storage in highly channelized reservoirs, where traditional ensemble methods struggle with covariance estimation and geologic realism. It introduces a framework that combines score-based diffusion models to generate large, geologically consistent super-ensembles with ML-enhanced localization to produce reliable, channel-respecting covariance estimates within ESMDA. Empirical results in a 2D channelized CO$_2$ storage setting show that ML-based localization preserves up to ~40% more ensemble variance than standard tapers and concentrates updates along high-permeability channels, improving data matching while maintaining geological realism. The approach is computationally efficient and practical for GCS applications, with clear avenues for extending to 3D, incorporating additional data types, and enabling online learning of localization proxies.
Abstract
Accurate characterization of subsurface heterogeneity is important for the safe and effective implementation of geological carbon storage (GCS) projects. This paper explores how machine learning methods can enhance data assimilation for GCS with a framework that integrates score-based diffusion models with machine learning-enhanced localization in channelized reservoirs during CO$_2$ injection. We employ a machine learning-enhanced localization framework that uses large ensembles ($N_s = 5000$) with permeabilities generated by the diffusion model and states computed by simple ML algorithms to improve covariance estimation for the Ensemble Smoother with Multiple Data Assimilation (ESMDA). We apply ML algorithms to a prior ensemble of channelized permeability fields, generated with the geostatistical model FLUVSIM. Our approach is applied on a CO$_2$ injection scenario simulated using the Delft Advanced Research Terra Simulator (DARTS). Our ML-based localization maintains significantly more ensemble variance than when localization is not applied, while achieving comparable data-matching quality. This framework has practical implications for GCS projects, helping improve the reliability of uncertainty quantification for risk assessment.
