Learning optimal integration of spatial and temporal information in noisy chemotaxis
Albert Alonso, Julius B. Kirkegaard
TL;DR
This work investigates how cells optimally combine spatial gradient sensing with temporal information during noisy chemotaxis. It employs a recurrent deep reinforcement learning policy to learn a combined spatial-temporal strategy within a minimal 2D cell model, revealing a continuous transition from temporal to spatial dominance as cell size varies, and a superior combined policy in the transition region. Integrated gradients show that this policy uses a nontrivial, dynamically evolving integration of cues rather than a simple sum of known strategies. The findings illuminate how flexible sensing-and-decision schemes can outperform fixed strategies in intermediate regimes and offer a framework to explore memory-augmented navigation in noisy chemical landscapes.
Abstract
We investigate the boundary between chemotaxis driven by spatial estimation of gradients and chemotaxis driven by temporal estimation. While it is well known that spatial chemotaxis becomes disadvantageous for small organisms at high noise levels, it is unclear whether there is a discontinuous switch of optimal strategies or a continuous transition exists. Here, we employ deep reinforcement learning to study the possible integration of spatial and temporal information in an a priori unconstrained manner. We parameterize such a combined chemotactic policy by a recurrent neural network and evaluate it using a minimal theoretical model of a chemotactic cell. By comparing with constrained variants of the policy, we show that it converges to purely temporal and spatial strategies at small and large cell sizes, respectively. We find that the transition between the regimes is continuous, with the combined strategy outperforming in the transition region both the constrained variants as well as models that explicitly integrate spatial and temporal information. Finally, by utilizing the attribution method of integrated gradients, we show that the policy relies on a non-trivial combination of spatially and temporally derived gradient information in a ratio that varies dynamically during the chemotactic trajectories.
