Learning optimal integration of spatial and temporal information in noisy chemotaxis

Albert Alonso; Julius B. Kirkegaard

Learning optimal integration of spatial and temporal information in noisy chemotaxis

Albert Alonso, Julius B. Kirkegaard

TL;DR

This work investigates how cells optimally combine spatial gradient sensing with temporal information during noisy chemotaxis. It employs a recurrent deep reinforcement learning policy to learn a combined spatial-temporal strategy within a minimal 2D cell model, revealing a continuous transition from temporal to spatial dominance as cell size varies, and a superior combined policy in the transition region. Integrated gradients show that this policy uses a nontrivial, dynamically evolving integration of cues rather than a simple sum of known strategies. The findings illuminate how flexible sensing-and-decision schemes can outperform fixed strategies in intermediate regimes and offer a framework to explore memory-augmented navigation in noisy chemical landscapes.

Abstract

We investigate the boundary between chemotaxis driven by spatial estimation of gradients and chemotaxis driven by temporal estimation. While it is well known that spatial chemotaxis becomes disadvantageous for small organisms at high noise levels, it is unclear whether there is a discontinuous switch of optimal strategies or a continuous transition exists. Here, we employ deep reinforcement learning to study the possible integration of spatial and temporal information in an a priori unconstrained manner. We parameterize such a combined chemotactic policy by a recurrent neural network and evaluate it using a minimal theoretical model of a chemotactic cell. By comparing with constrained variants of the policy, we show that it converges to purely temporal and spatial strategies at small and large cell sizes, respectively. We find that the transition between the regimes is continuous, with the combined strategy outperforming in the transition region both the constrained variants as well as models that explicitly integrate spatial and temporal information. Finally, by utilizing the attribution method of integrated gradients, we show that the policy relies on a non-trivial combination of spatially and temporally derived gradient information in a ratio that varies dynamically during the chemotactic trajectories.

Learning optimal integration of spatial and temporal information in noisy chemotaxis

TL;DR

Abstract

Paper Structure (10 sections, 15 equations, 7 figures)

This paper contains 10 sections, 15 equations, 7 figures.

Introduction
Methods
The simulation model
The policy
Results
Optimizing for noise-robust strategies
Smooth transition between a temporal and a spatial strategy
Integrating temporal and spatial information
Discussion
Comparison with interpretable models

Figures (7)

Figure 1: A Representation of the model cell with five sensors surrounded by chemoattractant particles. Each sensor measures the number of particles $M_i$ inside its sensing range $r_s$ and transforms it as $m_i=log(M_i+1)$. B Illustration of the simulation environment where the cell navigates towards the centre of the chemoattractant source. C Phase space diagram of cell sizes and speeds showing the distribution of common unicellular prokaryotes and eukaryotes. The dashed line roughly indicates the binary division between temporal and spatial navigation strategies wanOriginsEukaryoticExcitability2021. Data from Refs. wanOriginsEukaryoticExcitability2021rodriguesBankSwimmingOrganisms2021. D Our three neural network policies output the cell's action based on the measurements and hidden states. The combined policy has access to the individual measurement of its sensors and has a hidden state used in a recurrent neural network layer, whereas spatial and temporal only have one of these features. Dense: a linear NN layer connecting all inputs with all outputs. MLP: Multilayer perceptron, a sequence of dense layers with non-linear activations. GRU: Gated recurrent unit, a simple form of recurrent neural network module, which combines a hidden state with new input. The policy output of the model is both a mean value $\mu_t$ and a standard deviation $\sigma_t$, which defines a normal distribution from which an action $a_t$ is sampled. In our experiments, $\sigma_t \rightarrow 0$ at the end of training (see SI) results in deterministic policies.
Figure 2: A-B Example trajectories of found strategies at $R=2\upmu\textrm{m}$ for each variant, in nutrient-rich media ($C_0/C_q = 10^4$) and at nutrient-depleted environments ($C_0/C_q=1$), respectively. Circles indicate $\delta = 10 \, \upmu \mathrm{m}$. C-D Measurement values of each sensor of the cell at A and B respectively. Each color represents one of the $K=5$ sensors used in the trajectories. The measurements correspond to those of the Combined cell.
Figure 3: A Chemotactic efficiency of each variant on reaching the source as a function of cell size. Each value is the result of training and evaluating the policies at that cell radius for sampled values of $C_0$. The average efficiency is evaluated on $2^{16}$ independent runs. A "blind" agent obtains efficiency $\eta \approx 0.02$. B Distribution of arrival times to the source of the three cell variants at $R=2\upmu\textrm{m}$. All evaluations use sampled concentrations.
Figure 4: A Average memory usage contribution to the steering output during the simulation runs at different sizes and concentration levels $C_0$. The dashed line indicates $U_h \approx 0.5$, i.e. the transition from a memory-dominated strategy to a more reactive sensing-based policy. B Distribution of memory usage $U_h$ values during individual trajectories, evaluated at different distances to the source. $R=2 \, \upmu\textrm{m}$.
Figure 5: A Contribution of each sensor from past time measurements to the current action. The three variants at $R=2 \, \upmu\textrm{m}$ are shown, with data coloured by sensor position as indicated in the cell diagram. For the temporal variant (dashed), only one sensor is shown as all have the same profile as per the designed symmetry. The red arrow indicates the swimming direction. Curves are obtained by averaging over $\sim 10^5$ trajectories with initial conditions sampled similarly as in previous plots. B Sensor contributions on the combined policy for different cell sizes. C Trajectory visualization of both small (top) and large (bottom) cells. See SI for a plot with all three variants.
...and 2 more figures

Learning optimal integration of spatial and temporal information in noisy chemotaxis

TL;DR

Abstract

Learning optimal integration of spatial and temporal information in noisy chemotaxis

Authors

TL;DR

Abstract

Table of Contents

Figures (7)