Table of Contents
Fetching ...

Optimization of geological carbon storage operations with multimodal latent dynamic model and deep reinforcement learning

Zhongzheng Wang, Yuntian Chen, Guodong Chen, Dongxiao Zhang

TL;DR

This work addresses the computational bottleneck of optimizing geological carbon storage (GCS) operations by introducing the multimodal latent dynamic (MLD) model, a three-module surrogate that compresses multimodal inputs into a latent space, evolves latent states, and predicts flow responses. Coupled with soft actor-critic reinforcement learning, the MLD surrogate enables fast, generalizable policy optimization (MSDRL) that maximizes net present value ($NPV$) over the project horizon. The study demonstrates strong forward-model accuracy and significant efficiency gains, achieving higher $NPV$ than gradient-free and surrogate baselines while reducing computation by over 60%, with evidence of generalization to unseen geological scenarios. The framework offers a scalable pathway for data-driven, online-capable optimization in subsurface energy systems and can be extended to incorporate constraints and alternative DRL algorithms.

Abstract

Maximizing storage performance in geological carbon storage (GCS) is crucial for commercial deployment, but traditional optimization demands resource-intensive simulations, posing computational challenges. This study introduces the multimodal latent dynamic (MLD) model, a deep learning framework for fast flow prediction and well control optimization in GCS. The MLD model includes a representation module for compressed latent representations, a transition module for system state evolution, and a prediction module for flow responses. A novel training strategy combining regression loss and joint-embedding consistency loss enhances temporal consistency and multi-step prediction accuracy. Unlike existing models, the MLD supports diverse input modalities, allowing comprehensive data interactions. The MLD model, resembling a Markov decision process (MDP), can train deep reinforcement learning agents, specifically using the soft actor-critic (SAC) algorithm, to maximize net present value (NPV) through continuous interactions. The approach outperforms traditional methods, achieving the highest NPV while reducing computational resources by over 60%. It also demonstrates strong generalization performance, providing improved decisions for new scenarios based on knowledge from previous ones.

Optimization of geological carbon storage operations with multimodal latent dynamic model and deep reinforcement learning

TL;DR

This work addresses the computational bottleneck of optimizing geological carbon storage (GCS) operations by introducing the multimodal latent dynamic (MLD) model, a three-module surrogate that compresses multimodal inputs into a latent space, evolves latent states, and predicts flow responses. Coupled with soft actor-critic reinforcement learning, the MLD surrogate enables fast, generalizable policy optimization (MSDRL) that maximizes net present value () over the project horizon. The study demonstrates strong forward-model accuracy and significant efficiency gains, achieving higher than gradient-free and surrogate baselines while reducing computation by over 60%, with evidence of generalization to unseen geological scenarios. The framework offers a scalable pathway for data-driven, online-capable optimization in subsurface energy systems and can be extended to incorporate constraints and alternative DRL algorithms.

Abstract

Maximizing storage performance in geological carbon storage (GCS) is crucial for commercial deployment, but traditional optimization demands resource-intensive simulations, posing computational challenges. This study introduces the multimodal latent dynamic (MLD) model, a deep learning framework for fast flow prediction and well control optimization in GCS. The MLD model includes a representation module for compressed latent representations, a transition module for system state evolution, and a prediction module for flow responses. A novel training strategy combining regression loss and joint-embedding consistency loss enhances temporal consistency and multi-step prediction accuracy. Unlike existing models, the MLD supports diverse input modalities, allowing comprehensive data interactions. The MLD model, resembling a Markov decision process (MDP), can train deep reinforcement learning agents, specifically using the soft actor-critic (SAC) algorithm, to maximize net present value (NPV) through continuous interactions. The approach outperforms traditional methods, achieving the highest NPV while reducing computational resources by over 60%. It also demonstrates strong generalization performance, providing improved decisions for new scenarios based on knowledge from previous ones.
Paper Structure (22 sections, 26 equations, 16 figures, 8 tables, 2 algorithms)

This paper contains 22 sections, 26 equations, 16 figures, 8 tables, 2 algorithms.

Figures (16)

  • Figure 1: Schematic of agent–environment interaction in an MDP.
  • Figure 2: Summary of the MLD model. The MLD model consists of a representation module, a transition module, and a prediction module. The parameters of each module are shared at discrete time steps of an episode to improve efficiency. At the training stage, three modules are jointly optimized by minimizing the regression and consistency loss function. At the test stage (solid gray line), only the first state input is required and the subsequent predictions are made entirely in the latent space.
  • Figure 3: Network architectures of three components. (a) Representation module, implemented as a fused encoder. CNN and MLP are used to extract information from different branches. (b) Transition module, implemented as an MLP. (c) Prediction module, implemented as an MLP. FC and Conv denote the fully connected layer and convolutional layer, respectively. ReLU stands for Rectified Linear Unit, which is a commonly used activation function in deep learning models.
  • Figure 4: Structure of the SAC agent. SAC agent contains an actor and a critic. For the actor, the input is the state and the output is the action. For the critic, the input is the state-action pair and the output is the Q-value.
  • Figure 5: Workflow of the MSDRL algorithm. The learned MLD model serves as an interactive environment to train a SAC agent that learns control policies entirely from the predictions in the compact latent space.
  • ...and 11 more figures