Optimal control of stochastic reaction networks with entropic control cost and emergence of mode-switching strategies

Shuhei A. Horiguchi; Tetsuya J. Kobayashi

Optimal control of stochastic reaction networks with entropic control cost and emergence of mode-switching strategies

Shuhei A. Horiguchi, Tetsuya J. Kobayashi

TL;DR

This work develops a unified framework for the optimal control of stochastic reaction networks with discrete, nonnegative counts and absorbing states, using $f$-divergence–based costs and, in particular, KL divergence to linearize the HJB via the Cole–Hopf transformation. It derives a time-dependent, state-feedback controller $k^{\dagger}_r(t,n,\beta)=k^0_r\exp(\overline{\nabla}_{s_r}V_t(n,\beta))$ and a linear backward equation for $Z_t(n,\beta)=\exp(V_t(n,\beta))$, enabling efficient computation and probabilistic interpretation through the Feynman–Kac representation. The framework is demonstrated on interacting random walkers, Moran processes, and SIR epidemic models, exposing mode-switching behavior in optimal controls and showing that a finite per-time control cost can prevent extinction in Moran-type dynamics. By connecting control with path-measure optimization, the method offers scalable, extinction-aware strategies with broad applicability to RN-based biology and epidemiology, while outlining directions for risk-sensitive extensions and partially observed settings.

Abstract

Controlling the stochastic dynamics of biological populations is a challenge that arises across various biological contexts. However, these dynamics are inherently nonlinear and involve a discrete state space, i.e., the number of molecules, cells, or organisms. Additionally, the possibility of extinction has a significant impact on both dynamics and control strategies, particularly when the population size is small. These factors hamper the direct application of conventional control theories to biological systems. To address these challenges, we formulate the optimal control problem for stochastic population dynamics by utilizing control cost functions based on the f-divergence, which naturally accounts for population-specific factors. If Kullback-Leibler (KL) divergence is adopted for the cost function, the complex nonlinear Hamilton-Jacobi-Bellman equation is simplified into a linear form, facilitating efficient computation of optimal solutions. We demonstrate the effectiveness of our approach by applying it to the control of interacting random walkers, Moran processes, and SIR models, and observe the mode-switching phenomena in the control strategies. Our approach provides new opportunities for applying control theory to a wide range of biological problems.

Optimal control of stochastic reaction networks with entropic control cost and emergence of mode-switching strategies

TL;DR

This work develops a unified framework for the optimal control of stochastic reaction networks with discrete, nonnegative counts and absorbing states, using

-divergence–based costs and, in particular, KL divergence to linearize the HJB via the Cole–Hopf transformation. It derives a time-dependent, state-feedback controller

and a linear backward equation for

, enabling efficient computation and probabilistic interpretation through the Feynman–Kac representation. The framework is demonstrated on interacting random walkers, Moran processes, and SIR epidemic models, exposing mode-switching behavior in optimal controls and showing that a finite per-time control cost can prevent extinction in Moran-type dynamics. By connecting control with path-measure optimization, the method offers scalable, extinction-aware strategies with broad applicability to RN-based biology and epidemiology, while outlining directions for risk-sensitive extensions and partially observed settings.

Abstract

Paper Structure (21 sections, 133 equations, 8 figures)

This paper contains 21 sections, 133 equations, 8 figures.

Introduction
Optimal control of stochastic reaction systems
Stochastic reaction systems
General formulation of optimal control problems for stochastic RNs
First exit optimal control problems
Choice of the control cost function
Optimal control with KL cost
Controlling a random walker
Controlling interacting random walkers
Controlling survival in birth and death processes
Controlling epidemic outbreak
Discussion
Details about optimal control problem
Derivation of the $f$-divergence control cost
Path measure perspective and the KL control cost
...and 6 more sections

Figures (8)

Figure 1: Examples of biological phenomena described as reaction networks. (a) Movement of molecular motors on a microtubule with mutual interference. (b) Competing dynamics of populations in population genetics or ecology. (c) Spread of infectious diseases in a population. Purple thick lines in the panels represent the absorbing states in each phenomenon.
Figure 2: Diagram showing the properties of the control cost. The control of reaction fluxes is visualized as the control of flow speed in channels. $j_r^0$ and $j_r$ are the uncontrolled (input) and the controlled (output) reaction fluxes, respectively. $a, b \in [0, 1]$ are the ratios of dividing the input and output fluxes.
Figure 3: The results of the minimum exit time problem for interacting random walkers on one-dimensional space. (a) The value function $V(x_1, x_2, \beta)$ plotted by color in the state space $(x_1, x_2)$ with different $\beta$. The dashed curves are its contours. The blue zigzag lines are the trajectories of $100$ independent simulations starting from $(10, 11)$. (b) The value function $V(x_1, x_2, \beta)$ for fixed $x_1 = 0$ (the upper panel) and $x_1 = 10$ (the lower panel) with different $\beta$ indicated by the dots. The dashed lines are calculated with Eq. \ref{['eq:value_minimum_time_random_walk']}. The color code represents the value of $\beta$. (c) The directional derivative $\overline{\nabla}_2 V(x_1, x_2, \beta) = V(x_1, x_2+1, \beta) - V(x_1, x_2, \beta)$ of the value function in $x_2$ direction for fixed $x_1 = 0$ and $x_1 = 10$. The dashed lines correspond to Eq. \ref{['eq:value_minimum_time_random_walk']}. The format of the panels is the same as in (b). The parameters are $M=31$, $N=2$, $k^0=0.1$.
Figure 4: The results of the maximum exit time problem for Moran processes with $N=100$. (a) Dependence of the value function $V(n, \beta)$ on $\beta$ at $n_A=N/2$ with varying $\gamma$, the selective advantage of $A$ over $B$. $\gamma=1$ is the neutral case. Both the linear algebraic solution (colored points) and the analytical solution (colored curves) show divergence at $\beta=\beta_c$ (black dashed lines). (b) The relationship $\mathcal{E}(n, u)$ between the expected first exit time $u$ and the control cost starting from the state $n_A=N/2$. The format of the plot is the same as in (a). (c) The relationship $\mathcal{E}(n, u)/u$ between the expected first exit time and the control cost rate starting from the state $n_A=N/2$. The black dashed lines represent $\beta_c$ for each $\gamma$. The format of the plot is the same as in (a). (d, e) Dependence of the value function $V(n, \beta)$ (d) and its derivative $\overline{\nabla}_{s_1}V(n, \beta)$ (e) on the initial state $n_A$ for different values of $\beta$ and $\gamma$. In each panel, the values of $\beta$ are sampled at equal intervals between $0$ to $\beta_c$ and color-coded. (f) The optimally controlled stochastic trajectories from time $0$ to $T=100$ for different $\beta$ and $\gamma$. In each panel, there are $100$ trajectories with initial conditions $n_A(0)=1$ for different values of $\beta$, which are color-coded. The parameters are $k_1^0=\gamma, k_2^0=1.0$.
Figure 5: Comparison of maximum exit time solution for Moran processes with different control cost functions. (a–c) The value functions with the quadratic control cost function $C_{quad}$ are shown as in Fig. \ref{['fig:Moran_first_exit']} (a-c). (d–f) The value functions and optimally controlled trajectories with the quadratic $C_{quad}$, weighted KL $C_{wKL}$ ($w_1=\infty, w_2=1$), KL $C_{KL}$, and weighted KL $C_{wKL}$ ($w_1=1, w_2 = \infty$) control cost functions. The parameters are $N=100, k_1^0=1.5, k_2^0=1.0$.
...and 3 more figures

Optimal control of stochastic reaction networks with entropic control cost and emergence of mode-switching strategies

TL;DR

Abstract

Optimal control of stochastic reaction networks with entropic control cost and emergence of mode-switching strategies

Authors

TL;DR

Abstract

Table of Contents

Figures (8)