Table of Contents
Fetching ...

Optimal Sequential Decision-Making in Geosteering: A Reinforcement Learning Approach

Ressi Bonti Muhammad, Sergey Alyaev, Reidar Brumer Bratvold

TL;DR

This work formulates geosteering as a sequential decision-making problem and applies model-free reinforcement learning via Deep Q-Networks to optimize decisions. Across two synthetic, previously studied geosteering scenarios, RL achieves near-quasi-optimal performance relative to approximate dynamic programming while delivering substantial computational savings for online decision support. The study demonstrates two RL variants: RL-Posterior (Bayesian-informed state) and RL-Sensor (sensor-based state), with RL-Sensor offering a favorable balance of performance and speed. The findings suggest RL can flexiblely adapt to more complex, data-rich geosteering environments and potentially extend to real-data training, enabling scalable, real-time decision support in subsurface operations.

Abstract

Trajectory adjustment decisions throughout the drilling process, called geosteering, affect subsequent choices and information gathering, thus resulting in a coupled sequential decision problem. Previous works on applying decision optimization methods in geosteering rely on greedy optimization or approximate dynamic programming (ADP). Either decision optimization method requires explicit uncertainty and objective function models, making developing decision optimization methods for complex and realistic geosteering environments challenging to impossible. We use the Deep Q-Network (DQN) method, a model-free reinforcement learning (RL) method that learns directly from the decision environment, to optimize geosteering decisions. The expensive computations for RL are handled during the offline training stage. Evaluating DQN needed for real-time decision support takes milliseconds and is faster than the traditional alternatives. Moreover, for two previously published synthetic geosteering scenarios, our results show that RL achieves high-quality outcomes comparable to the quasi-optimal ADP. Yet, the model-free nature of RL means that by replacing the training environment, we can extend it to problems where the solution to ADP is prohibitively expensive to compute. This flexibility will allow applying it to more complex environments and make hybrid versions trained with real data in the future.

Optimal Sequential Decision-Making in Geosteering: A Reinforcement Learning Approach

TL;DR

This work formulates geosteering as a sequential decision-making problem and applies model-free reinforcement learning via Deep Q-Networks to optimize decisions. Across two synthetic, previously studied geosteering scenarios, RL achieves near-quasi-optimal performance relative to approximate dynamic programming while delivering substantial computational savings for online decision support. The study demonstrates two RL variants: RL-Posterior (Bayesian-informed state) and RL-Sensor (sensor-based state), with RL-Sensor offering a favorable balance of performance and speed. The findings suggest RL can flexiblely adapt to more complex, data-rich geosteering environments and potentially extend to real-data training, enabling scalable, real-time decision support in subsurface operations.

Abstract

Trajectory adjustment decisions throughout the drilling process, called geosteering, affect subsequent choices and information gathering, thus resulting in a coupled sequential decision problem. Previous works on applying decision optimization methods in geosteering rely on greedy optimization or approximate dynamic programming (ADP). Either decision optimization method requires explicit uncertainty and objective function models, making developing decision optimization methods for complex and realistic geosteering environments challenging to impossible. We use the Deep Q-Network (DQN) method, a model-free reinforcement learning (RL) method that learns directly from the decision environment, to optimize geosteering decisions. The expensive computations for RL are handled during the offline training stage. Evaluating DQN needed for real-time decision support takes milliseconds and is faster than the traditional alternatives. Moreover, for two previously published synthetic geosteering scenarios, our results show that RL achieves high-quality outcomes comparable to the quasi-optimal ADP. Yet, the model-free nature of RL means that by replacing the training environment, we can extend it to problems where the solution to ADP is prohibitively expensive to compute. This flexibility will allow applying it to more complex environments and make hybrid versions trained with real data in the future.
Paper Structure (22 sections, 12 equations, 10 figures, 7 tables)

This paper contains 22 sections, 12 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Key components required for initializing the RL agent and geosteering environment before training. This setup includes defining state and action spaces, establishing the reward function, and incorporating prior geological data
  • Figure 2: Extended RL training flowchart, building on the standard agent-environment interaction. Uncertain variables introduce variability at the start of each episode, simulating diverse geological conditions. Experience replay memory stores interactions for batch training, enhancing stability and generalization across scenarios.
  • Figure 3: General neural network architecture used in the study, generated using LeNail2019. The neural network architecture consists of two hidden layers that take the state space as inputs and output the Q-values for each available action. The input and output layer configurations, which vary based on the state and action spaces specific to each environment, are detailed in Table \ref{['tab:RL1state']} and last paragraph of subsection \ref{['rlset2']}.
  • Figure 4: Illustration of the geosteering scenario, a remake based on Kullawan2014-2. At every n = 10 discretization points, decisions are made to determine the well trajectory. The blue line represents the well path, while the red dashed line denotes the boundary between high and low-quality reservoir zones. The thickness of the reservoir at each discretization point is represented by $h$, while $DTUB$ and $DTLB$ denote the distances from the well to the upper and lower reservoir boundaries, respectively. Additionally, $DTHQ$ denotes the distances from the well to the high-quality zone.
  • Figure 5: Evolution of individual objectives (reservoir contact and high-quality zone percentages) and the overall rewards of two RL agents during training. The red lines represent the RL-Posterior method, while the blue lines represent the RL-Sensor method. The figure reflects the average from the last 100 training episodes.
  • ...and 5 more figures