Run-and-tumble chemotaxis using reinforcement learning

Ramesh Pramanik; Shradha Mishra; Sakuntala Chatterjee

Run-and-tumble chemotaxis using reinforcement learning

Ramesh Pramanik, Shradha Mishra, Sakuntala Chatterjee

TL;DR

We address how reinforcement learning can capture run-and-tumble chemotaxis in spatial attractant gradients and learn the environment. The approach uses a one-dimensional RL framework with two actions (Persist, Reverse), a history-based cost from $[L](x(t-\Delta_1))$ vs $[L](x(t-\Delta_2))$ with $\Delta_1=1$, $\Delta_2=2$, and $Q$-learning governed by $\alpha$ and $\epsilon$, applied to sinusoidal and multi-peak attractant profiles. The findings show that long-time localization, quantified by uptake $\langle C\rangle$, and the ability to learn the full landscape depend nontrivially on $\epsilon$ and $\alpha$, with trapping effects when peaks are unequal and clear optimal regions in the $\epsilon$–$\alpha$ plane; the mean run duration in homogeneous settings is $\tau=\dfrac{2}{\epsilon(1-p_0)}$, while structured environments exhibit nonmonotonic $\tau(\epsilon)$ and optimal search times via mean first passage times. The work provides a quantitative link between reinforcement-learning strategies and chemotactic navigation, offering insights into bacterial behavior and guiding the design of RL-guided microrobots operating in gradient fields.

Abstract

Bacterial cells use run-and-tumble motion to climb up attractant concentration gradient in their environment. By extending the uphill runs and shortening the downhill runs the cells migrate towards the higher attractant zones. Motivated by this, we formulate a reinforcement learning (RL) algorithm where an agent moves in one dimension in the presence of an attractant gradient. The agent can perform two actions: either persistent motion in the same direction or reversal of direction. We assign costs for these actions based on the recent history of the agent's trajectory. We ask the question: which RL strategy works best in different types of attractant environment. We quantify efficiency of the RL strategy by the ability of the agent (a) to localize in the favorable zones after large times, and (b) to learn about its complete environment. Depending on the attractant profile and the initial condition, we find an optimum balance is needed between exploration and exploitation to ensure the most efficient performance.

Run-and-tumble chemotaxis using reinforcement learning

TL;DR

with

, and

-learning governed by

and

, applied to sinusoidal and multi-peak attractant profiles. The findings show that long-time localization, quantified by uptake

, and the ability to learn the full landscape depend nontrivially on

and

, with trapping effects when peaks are unequal and clear optimal regions in the

–

plane; the mean run duration in homogeneous settings is

, while structured environments exhibit nonmonotonic

and optimal search times via mean first passage times. The work provides a quantitative link between reinforcement-learning strategies and chemotactic navigation, offering insights into bacterial behavior and guiding the design of RL-guided microrobots operating in gradient fields.

Abstract

Paper Structure (10 sections, 2 equations, 12 figures, 1 table)

This paper contains 10 sections, 2 equations, 12 figures, 1 table.

Introduction
Formulation of RL algorithm
Performance of RL agent in sine wave attractant environment
Shortest run for a specific exploration parameter
Exploration-exploitation competition affects performance
Attractant profile with different peak heights
Performance peak for optimal $\epsilon$ and $\alpha$
First passage time from lower peak to higher peak of attractant profile
Discussions
Acknowledgements

Figures (12)

Figure 1: Attractant concentration profile $[L](x)$. Solid line corresponds to $[L](x) = [L]_0+ \sin (2 \pi x / \lambda)$ and dashed line corresponds to $[L](x) = [L]_0+ \sin (2 \pi x / \lambda_1) + \sin (2 \pi x / \lambda_2)$. Here, $[L]_0$ represents the background concentration, which also makes sure $[L](x)$ never becomes negative. Unless mentioned other wise, we throughout use $[L]_0 = 5$, $\lambda = \lambda_1 = 500$ and $\lambda_2=1000$ and system size $L=1000$ with periodic boundary condition.
Figure 2: The average run duration in a homogeneous medium is inversely proportional to $\epsilon$. The discrete points show simulation data, and the continuous line shows analytical expression. Here we have used $p_0=0.9$.
Figure 3: The position distribution of the agent after a long time in a sine wave attractant profile. The agent localizes more strongly near the two attractant peaks as $\epsilon$ decreases and/or $\alpha$ increases. Here we have used uniform initial condition and $\Delta{_1}=1$, $\Delta{_2}=2$, $p_0=0.90$.
Figure 4: Mean run duration $\tau$ shows a minimum with $\epsilon$. These data are for sine wave attractant profile with $p_0=0.90$ and $\alpha=0.001$.
Figure 5: Uptake $\langle C \rangle$ decreases with $\epsilon$ and increases with $\alpha$. In panel (a) we have used $\alpha=0.001$ and in (b) we have $\epsilon=0.20$. All other simulation parameters are as in Fig. \ref{['fig:sinpx']}.
...and 7 more figures

Run-and-tumble chemotaxis using reinforcement learning

TL;DR

Abstract

Run-and-tumble chemotaxis using reinforcement learning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)