Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning

Samuel Tovey; Christoph Lohrmann; Christian Holm

Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning

Samuel Tovey, Christoph Lohrmann, Christian Holm

TL;DR

The paper investigates how multi-agent actor-critic reinforcement learning can induce chemotactic behavior in microswimmers operating under Brownian motion. By training spherical, prolate, and oblate agents across a range of sizes and swim speeds in overdamped Langevin environments, the authors reveal that chemotaxis emerges once physically possible, with three dominant strategies—Run and Rotate, Gradient Gliding, and Brownian Piloting—and occasional Exotic policies. The results align with phase-boundary predictions based on translational and rotational Péclet numbers around unity, highlighting the role of $Pe^{trans}$ and $Pe^{rot}$ in constraining learnability. These findings offer mechanistic insights into plausible biological navigation strategies and provide design guidance for artificial microswimmers, suggesting optimal size-speed regimes where learning is most efficient. The study thus bridges AI-driven learning with physical constraints to illuminate emergent search strategies in noisy environments.

Abstract

Reinforcement learning (RL) is a flexible and efficient method for programming micro-robots in complex environments. Here we investigate whether reinforcement learning can provide insights into biological systems when trained to perform chemotaxis. Namely, whether we can learn about how intelligent agents process given information in order to swim towards a target. We run simulations covering a range of agent shapes, sizes, and swim speeds to determine if the physical constraints on biological swimmers, namely Brownian motion, lead to regions where reinforcement learners' training fails. We find that the RL agents can perform chemotaxis as soon as it is physically possible and, in some cases, even before the active swimming overpowers the stochastic environment. We study the efficiency of the emergent policy and identify convergence in agent size and swim speeds. Finally, we study the strategy adopted by the reinforcement learning algorithm to explain how the agents perform their tasks. To this end, we identify three emerging dominant strategies and several rare approaches taken. These strategies, whilst producing almost identical trajectories in simulation, are distinct and give insight into the possible mechanisms behind which biological agents explore their environment and respond to changing conditions.

Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning

TL;DR

and

in constraining learnability. These findings offer mechanistic insights into plausible biological navigation strategies and provide design guidance for artificial microswimmers, suggesting optimal size-speed regimes where learning is most efficient. The study thus bridges AI-driven learning with physical constraints to illuminate emergent search strategies in noisy environments.

Abstract

Paper Structure (30 sections, 31 equations, 16 figures, 1 table)

This paper contains 30 sections, 31 equations, 16 figures, 1 table.

Introduction
Theory
Biological Chemotaxis
Actor-Critic Reinforcement Learning
Multi-Agent Reinforcement Learning
Methods
SwarmRL
ESPResSo Simulations
Reinforcement Learning Parameters
Agent Definition
Computational Methods
Results
Probability of Emergent Chemotaxis
Learning Efficiency
Policy Efficiency
...and 15 more sections

Figures (16)

Figure 1: Representation of actor-critic reinforcement learning architectures.
Figure 2: Graphical Representation of the three agent shapes considered in this study, the sphere (center), prolate (right), and oblate (left). In each case, the volume of the agent is kept equal for a given radius value.
Figure 3: Probability of successful chemotaxis emerging from RL studies. Raw data from the experiment. The colour of each point corresponds to the number of RL simulations that successfully learned how to perform chemotaxis. The green lines indicate the theoretical values at which translational (solid) and rotational (dashed) diffusion becomes dominant compared to the active motion of the agents.
Figure 4: Probability of successful chemotaxis emerging from RL studies. Raw data from the experiment. The colour of each point corresponds to the maximum reward achieved by the agents during the 10'000 episodes.
Figure 5: (left) Mean distance from the source for each swim speed and colloid size. A clear minimum in each plot suggests an optimal size dependent on swim speed. (right) Rate of convergence to the source for different swim speeds and sizes. Interestingly, the convergence rate of larger colloids is relatively similar, suggesting some redundancy in larger body sizes and swim speeds.
...and 11 more figures

Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning

TL;DR

Abstract

Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (16)