Reinforcement Learning for Control of Non-Markovian Cellular Population Dynamics
Josiah C. Kratz, Jacob Adamczyk
TL;DR
This work addresses controlling non-Markovian cellular populations with memory effects to suppress proliferation under drug dosing, formalized through a memory kernel with memory strength $\mu\in(0,1]$ and a final objective $C=\log N(T)/N(0)$ where $N(t)=S(t)+R(t)$. It introduces a memory-enabled two-state model with susceptible $S$ and resistant $R$ cells and proves that the optimal control is bang-bang under monotone dose–response, guiding the use of end-to-end reinforcement learning to discover effective dosing policies. The authors demonstrate that model-free deep RL can recover the memoryless optimal policy and, when memory is present, learn robust, memory-aware dosing strategies even under observation noise, outperforming baselines. They further enhance generalization with domain randomization over memory strength and distributional RL (FQF) to cope with uncertain memory and noise, achieving strong performance across scenarios. The results highlight the potential for RL-guided adaptive dosing in clinical contexts, offering practical, bang-bang policies that remain effective despite non-Markovian dynamics and measurement perturbations.
Abstract
Many organisms and cell types, from bacteria to cancer cells, exhibit a remarkable ability to adapt to fluctuating environments. Additionally, cells can leverage a memory of past environments to better survive previously-encountered stressors. From a control perspective, this adaptability poses significant challenges in driving cell populations toward extinction, and thus poses an open question with great clinical significance. In this work, we focus on drug dosing in cell populations exhibiting phenotypic plasticity. For specific dynamical models switching between resistant and susceptible states, exact solutions are known. However, when the underlying system parameters are unknown, and for complex memory-based systems, obtaining the optimal solution is currently intractable. To address this challenge, we apply reinforcement learning (RL) to identify informed dosing strategies to control cell populations evolving under novel non-Markovian dynamics. We find that model-free deep RL is able to recover exact solutions and control cell populations even in the presence of long-range temporal dynamics. To further test our approach in more realistic settings, we demonstrate robust RL-based control strategies in environments with measurement noise and dynamic memory strength.
