Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning
Samuel Tovey, Christoph Lohrmann, Christian Holm
TL;DR
The paper investigates how multi-agent actor-critic reinforcement learning can induce chemotactic behavior in microswimmers operating under Brownian motion. By training spherical, prolate, and oblate agents across a range of sizes and swim speeds in overdamped Langevin environments, the authors reveal that chemotaxis emerges once physically possible, with three dominant strategies—Run and Rotate, Gradient Gliding, and Brownian Piloting—and occasional Exotic policies. The results align with phase-boundary predictions based on translational and rotational Péclet numbers around unity, highlighting the role of $Pe^{trans}$ and $Pe^{rot}$ in constraining learnability. These findings offer mechanistic insights into plausible biological navigation strategies and provide design guidance for artificial microswimmers, suggesting optimal size-speed regimes where learning is most efficient. The study thus bridges AI-driven learning with physical constraints to illuminate emergent search strategies in noisy environments.
Abstract
Reinforcement learning (RL) is a flexible and efficient method for programming micro-robots in complex environments. Here we investigate whether reinforcement learning can provide insights into biological systems when trained to perform chemotaxis. Namely, whether we can learn about how intelligent agents process given information in order to swim towards a target. We run simulations covering a range of agent shapes, sizes, and swim speeds to determine if the physical constraints on biological swimmers, namely Brownian motion, lead to regions where reinforcement learners' training fails. We find that the RL agents can perform chemotaxis as soon as it is physically possible and, in some cases, even before the active swimming overpowers the stochastic environment. We study the efficiency of the emergent policy and identify convergence in agent size and swim speeds. Finally, we study the strategy adopted by the reinforcement learning algorithm to explain how the agents perform their tasks. To this end, we identify three emerging dominant strategies and several rare approaches taken. These strategies, whilst producing almost identical trajectories in simulation, are distinct and give insight into the possible mechanisms behind which biological agents explore their environment and respond to changing conditions.
