Short term vs. long term: optimization of microswimmer navigation on different time horizons
Navid Mousavi, Jingran Qiu, Lihao Zhao, Bernhard Mehlig, Kristian Gustavsson
TL;DR
The paper tackles how microswimmers can efficiently avoid high-strain regions in turbulent flows by using local cues. It combines reinforcement learning with an analytical short- and long-horizon framework to identify which signals—strain magnitude ${\mathcal{S}}^2$ and its gradients—are most effective for navigation, and how the optimal strategy depends on the time horizon and update frequency. Gradients of squared strain emerge as particularly powerful signals, with policies that align propulsion and rotation to decrease encountered strain, and the best performance occurs when the optimization horizon matches the flow correlation time $\tau_f$. The results bridge short-time expansions and RL, revealing that gradient-based strategies are robust across models and highlight implications for biological swimmers and artificial microrobots operating in turbulent environments.
Abstract
We use reinforcement learning to find strategies that allow microswimmers in turbulence to avoid regions of large strain. This question is motivated by the hypothesis that swimming microorganisms tend to avoid such regions to minimise the risk of predation. We ask which local cues a microswimmer must measure to efficiently avoid such straining regions. We find that it can succeed without directional information, merely by measuring the magnitude of the local strain. However, the swimmer avoids straining regions more efficiently if it can measure the sign of local strain gradients. We compare our results with those of an earlier study [Mousavi {\em et al.} Phys. Rev. Res. {\bf 6}, L022034 (2024)] where a short-time expansion was used to find optimal strategies. We find that the short-time strategies work well in some cases but not in others. We derive a new theory that explains when the time-horizon matters for our optimisation problem, and when it does not. We find the strategy with best performance when the time-horizon coincides with the correlation time of the turbulent fluctuations. We also explain how the update frequency (the frequency at which the swimmer updates its strategy) affects the found strategies. We find that higher update frequencies yield better performance.
