Optimization of geological carbon storage operations with multimodal latent dynamic model and deep reinforcement learning
Zhongzheng Wang, Yuntian Chen, Guodong Chen, Dongxiao Zhang
TL;DR
This work addresses the computational bottleneck of optimizing geological carbon storage (GCS) operations by introducing the multimodal latent dynamic (MLD) model, a three-module surrogate that compresses multimodal inputs into a latent space, evolves latent states, and predicts flow responses. Coupled with soft actor-critic reinforcement learning, the MLD surrogate enables fast, generalizable policy optimization (MSDRL) that maximizes net present value ($NPV$) over the project horizon. The study demonstrates strong forward-model accuracy and significant efficiency gains, achieving higher $NPV$ than gradient-free and surrogate baselines while reducing computation by over 60%, with evidence of generalization to unseen geological scenarios. The framework offers a scalable pathway for data-driven, online-capable optimization in subsurface energy systems and can be extended to incorporate constraints and alternative DRL algorithms.
Abstract
Maximizing storage performance in geological carbon storage (GCS) is crucial for commercial deployment, but traditional optimization demands resource-intensive simulations, posing computational challenges. This study introduces the multimodal latent dynamic (MLD) model, a deep learning framework for fast flow prediction and well control optimization in GCS. The MLD model includes a representation module for compressed latent representations, a transition module for system state evolution, and a prediction module for flow responses. A novel training strategy combining regression loss and joint-embedding consistency loss enhances temporal consistency and multi-step prediction accuracy. Unlike existing models, the MLD supports diverse input modalities, allowing comprehensive data interactions. The MLD model, resembling a Markov decision process (MDP), can train deep reinforcement learning agents, specifically using the soft actor-critic (SAC) algorithm, to maximize net present value (NPV) through continuous interactions. The approach outperforms traditional methods, achieving the highest NPV while reducing computational resources by over 60%. It also demonstrates strong generalization performance, providing improved decisions for new scenarios based on knowledge from previous ones.
