Table of Contents
Fetching ...

Reinforcement learning for robust dynamic metabolic control

Sebastián Espinel-Ríos, River Walser, Dongda Zhang

TL;DR

The paper presents an RL‑based framework for robust dynamic metabolic control that leverages a forward‑integrating surrogate model and domain randomization to derive policies for time‑varying enzyme expression in bioprocesses. By avoiding explicit model differentiation required by MPC, the approach trains policies that maximize a biologically meaningful return and generalize across uncertainties. Demonstrations in two E. coli systems show substantial gains over static control, with up to ~41% higher fatty acid titer and ~28% higher final lactate titer, while maintaining stability under realistic disturbances. The work highlights the potential of RL to streamline design‑build‑test cycles for dynamic metabolic engineering and to support in silico exploration of circuit topologies before experimental implementation.

Abstract

Dynamic metabolic control allows key metabolic fluxes to be modulated in real time, enhancing bioprocess flexibility and expanding available optimization degrees of freedom. This is achieved, e.g., via targeted modulation of metabolic enzyme expression. However, identifying optimal dynamic control policies is challenging due to the generally high-dimensional solution space and the need to manage metabolic burden and cytotoxic effects arising from inducible enzyme expression. The task is further complicated by stochastic dynamics, which reduce bioprocess reproducibility. We propose a reinforcement learning framework} to derive optimal policies by allowing an agent (the controller) to interact with a surrogate dynamic model. To promote robustness, we apply domain randomization, enabling the controller to generalize across uncertainties. When transferred to an experimental system, the agent can in principle continue fine-tuning the policy. Our framework provides an alternative to conventional model-based control such as model predictive control, which requires model differentiation with respect to decision variables; often impractical for complex stochastic, nonlinear, stiff, and piecewise-defined dynamics. In contrast, our approach relies on forward integration of the model, thereby simplifying the task. We demonstrate the framework in two $\textit{Escherichia coli}$ bioprocesses: dynamic control of acetyl-CoA carboxylase for fatty-acid synthesis and of adenosine triphosphatase for lactate synthesis.

Reinforcement learning for robust dynamic metabolic control

TL;DR

The paper presents an RL‑based framework for robust dynamic metabolic control that leverages a forward‑integrating surrogate model and domain randomization to derive policies for time‑varying enzyme expression in bioprocesses. By avoiding explicit model differentiation required by MPC, the approach trains policies that maximize a biologically meaningful return and generalize across uncertainties. Demonstrations in two E. coli systems show substantial gains over static control, with up to ~41% higher fatty acid titer and ~28% higher final lactate titer, while maintaining stability under realistic disturbances. The work highlights the potential of RL to streamline design‑build‑test cycles for dynamic metabolic engineering and to support in silico exploration of circuit topologies before experimental implementation.

Abstract

Dynamic metabolic control allows key metabolic fluxes to be modulated in real time, enhancing bioprocess flexibility and expanding available optimization degrees of freedom. This is achieved, e.g., via targeted modulation of metabolic enzyme expression. However, identifying optimal dynamic control policies is challenging due to the generally high-dimensional solution space and the need to manage metabolic burden and cytotoxic effects arising from inducible enzyme expression. The task is further complicated by stochastic dynamics, which reduce bioprocess reproducibility. We propose a reinforcement learning framework} to derive optimal policies by allowing an agent (the controller) to interact with a surrogate dynamic model. To promote robustness, we apply domain randomization, enabling the controller to generalize across uncertainties. When transferred to an experimental system, the agent can in principle continue fine-tuning the policy. Our framework provides an alternative to conventional model-based control such as model predictive control, which requires model differentiation with respect to decision variables; often impractical for complex stochastic, nonlinear, stiff, and piecewise-defined dynamics. In contrast, our approach relies on forward integration of the model, thereby simplifying the task. We demonstrate the framework in two bioprocesses: dynamic control of acetyl-CoA carboxylase for fatty-acid synthesis and of adenosine triphosphatase for lactate synthesis.

Paper Structure

This paper contains 15 sections, 16 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overview of the RL framework for robust dynamic metabolic control. $\bm{u}_t$: action/input at time $t$; $\bm{s}_t$: featurized system representation at time $t$; $\pi(\cdot)$: stochastic policy; $\bm{m}_t, \bm{\sigma}_t$: mean and standard deviation of Gaussian policy at time $t$; $\bm{\tau}$: joint trajectory of observed states, actions, and rewards; $J(\bm{\tau})$: return over $\bm{\tau}$; $\bm{\theta}$: policy parameters; $\bm{\Theta}$: deep neural network (DNN) parameters.
  • Figure 2: Overview of the first case study. RL framework for robust dynamic metabolic control coupled to a fatty acid biosynthetic process with ACC modulation. $\bm{u}_t$: action/input at time $t$; $\bm{s}_t$: featurized system representation at time $t$; $\pi(\cdot)$: stochastic policy; $\bm{m}_t, \bm{\sigma}_t$: mean and standard deviation of Gaussian policy at time $t$; $\bm{\tau}$: joint trajectory of observed states, actions, and rewards; $J(\bm{\tau})$: return over $\bm{\tau}$; $\bm{\theta}$: policy parameters; $\bm{\Theta}$: deep neural network (DNN) parameters.
  • Figure 3: Overview of the second case study. RL framework for robust dynamic metabolic control coupled to a lactate biosynthetic process with ATPase modulation. $\bm{u}_t$: action/input at time $t$; $\bm{s}_t$: featurized system representation at time $t$; $\pi(\cdot)$: stochastic policy; $\bm{m}_t, \bm{\sigma}_t$: mean and standard deviation of Gaussian policy at time $t$; $\bm{\tau}$: joint trajectory of observed states, actions, and rewards; $J(\bm{\tau})$: return over $\bm{\tau}$; $\bm{\theta}$: policy parameters; $\bm{\Theta}$: deep neural network (DNN) parameters.
  • Figure 4: Metabolic control results under ideal conditions (i.e., no system uncertainties) for the fatty acid biosynthesis case study with ACC modulation. (a) Evolution of the return function over epochs, up to the epoch with the highest mean value (selected control policy). The corresponding (b) input trajectory and (c)-(h) dynamic state trajectories associated with the selected control policy are also shown. The RL-derived dynamic control scenario (DC) is benchmarked against the static control scenario (SC). Uncertainty bands correspond to 500 episodes or trajectories. SD: standard deviation. $J$: return, $u$: control input (inducer); $X$: biomass; $S$: glucose; $R$: LacI; $M$: malonyl-CoA; $E$: manipulatable enzyme (ACC); $P$: fatty acid.
  • Figure 5: Control policies robust against system uncertainty for the fatty acid biosynthesis case study with ACC modulation, considering (a) 10 %, (b) 15 %, (c) 20 %, and (d) 25 % uncertainty in the initial conditions and key kinetic parameters affecting the expression of LacI and ACC. The RL-derived dynamic control scenario (DC) is benchmarked against the static control scenario (SC). The return function is presented up to the epoch with the highest mean return value, matching the chosen policy. Selected dynamic state trajectories correspond to the latter policy. Uncertainty bands correspond to 500 episodes or trajectories. SD: standard deviation. $J$: return, $u$: control input (inducer); $E$: manipulatable enzyme (ACC); $P$: fatty acid.
  • ...and 2 more figures