Short term vs. long term: optimization of microswimmer navigation on different time horizons

Navid Mousavi; Jingran Qiu; Lihao Zhao; Bernhard Mehlig; Kristian Gustavsson

Short term vs. long term: optimization of microswimmer navigation on different time horizons

Navid Mousavi, Jingran Qiu, Lihao Zhao, Bernhard Mehlig, Kristian Gustavsson

TL;DR

The paper tackles how microswimmers can efficiently avoid high-strain regions in turbulent flows by using local cues. It combines reinforcement learning with an analytical short- and long-horizon framework to identify which signals—strain magnitude ${\mathcal{S}}^2$ and its gradients—are most effective for navigation, and how the optimal strategy depends on the time horizon and update frequency. Gradients of squared strain emerge as particularly powerful signals, with policies that align propulsion and rotation to decrease encountered strain, and the best performance occurs when the optimization horizon matches the flow correlation time $\tau_f$. The results bridge short-time expansions and RL, revealing that gradient-based strategies are robust across models and highlight implications for biological swimmers and artificial microrobots operating in turbulent environments.

Abstract

We use reinforcement learning to find strategies that allow microswimmers in turbulence to avoid regions of large strain. This question is motivated by the hypothesis that swimming microorganisms tend to avoid such regions to minimise the risk of predation. We ask which local cues a microswimmer must measure to efficiently avoid such straining regions. We find that it can succeed without directional information, merely by measuring the magnitude of the local strain. However, the swimmer avoids straining regions more efficiently if it can measure the sign of local strain gradients. We compare our results with those of an earlier study [Mousavi {\em et al.} Phys. Rev. Res. {\bf 6}, L022034 (2024)] where a short-time expansion was used to find optimal strategies. We find that the short-time strategies work well in some cases but not in others. We derive a new theory that explains when the time-horizon matters for our optimisation problem, and when it does not. We find the strategy with best performance when the time-horizon coincides with the correlation time of the turbulent fluctuations. We also explain how the update frequency (the frequency at which the swimmer updates its strategy) affects the found strategies. We find that higher update frequencies yield better performance.

Short term vs. long term: optimization of microswimmer navigation on different time horizons

TL;DR

and its gradients—are most effective for navigation, and how the optimal strategy depends on the time horizon and update frequency. Gradients of squared strain emerge as particularly powerful signals, with policies that align propulsion and rotation to decrease encountered strain, and the best performance occurs when the optimization horizon matches the flow correlation time

. The results bridge short-time expansions and RL, revealing that gradient-based strategies are robust across models and highlight implications for biological swimmers and artificial microrobots operating in turbulent environments.

Abstract

Paper Structure (26 sections, 24 equations, 8 figures, 4 tables)

This paper contains 26 sections, 24 equations, 8 figures, 4 tables.

Introduction
Methods
Swimmer model
Turbulent-flow models
Active control
Non-dimensional parameters
Reinforcement learning
Short-time expansion
Numerical results
Navigation based on strain magnitude
Navigation based on squared strain and its derivatives
Approximate theory for optimization on different time horizons
Theory
Application to squared strain
Application to gradients of squared strain
...and 11 more sections

Figures (8)

Figure 1: Illustration of spheroidal microswimmer with swimming direction ${\hat{\boldsymbol{n}}}$, direction of antennae ${\hat{\boldsymbol{p}}}$, and ${\hat{\boldsymbol{q}}}={\hat{\boldsymbol{n}}}\times{\hat{\boldsymbol{p}}}$, in a fixed Cartesian frame of reference ($\hat{\boldsymbol{x}}$, $\hat{\boldsymbol{y}}$, $\hat{\boldsymbol{z}}$). Curved arrows denote positive directions of angular swimming velocities $\boldsymbol{\omega^{({\rm s})}_p}$ and $\boldsymbol{\omega^{({\rm s})}_q}$.
Figure 2: Comparison of numerical results for the average ${{\mathcal{S}}^2}$ against average swimming speed $\Phi_\eta$ for swimming strategies ( a) excluding and ( b) including strain gradients as signal. ( a) Results for reinforcement learning (RL) in Table \ref{['tab:strategy']} for cruising swimmers in the statistical model (green, $\blacksquare$) and jumping ones in DNS ($\Box$), and for the analytical strategy in Eq. (\ref{['eq:optimal_policy_TrSSqr']}) for cruising (red, $\bullet$) and jumping ($\circ$). ( b) Results for RL in Table \ref{['tab:strategy_general']} for cruising (green, $\blacksquare$) and jumping ($\Box$) swimmers, and for Eq. (\ref{['eq:optimal_policy']}) evaluated for cruising swimmers (red, $\bullet$) and jumping swimmers disregarding the signal $X^{(q)}_\perp$ ($\circ$). Parameters: $\Xi=6.25$ for cruising swimmers and $\Xi=39$ for jumping ones, and $\lambda=2$.
Figure 3: ( a) Time averaged coefficient $\overline{C_\sigma}(T_{\rm p})$ (see text) against swimming velocity for different prediction horizons $T_{\rm p}$ with rotational swimming ($\omega^{({\rm s})}\tau_{{\rm f}}=5$, dashed lines) and without ($\omega^{({\rm s})}=0$, solid lines). ( b) Average squared strain, $\langle{{\mathcal{S}}(t)^2}\rangle=\langle\hbox{tr}(\mathbb{S}(t)^2)\rangle\tau_{\eta}^2$ against $T_{\rm p}$, following the optimal strategy. The solid line shows an analytical evaluation of the predicted time averaged strain (see text). Markers show the time average from numerical simulations of swimmers following Eq. (\ref{['eq:optimal_policy_TrSSqr']}). Results are obtained by either choosing the initial position randomly (red, $\circ$), or by sequentially measuring the signal and updating the control each time interval $T_{\rm u}=T_{\rm p}$ (red, $\bullet$). Parameters: $v^{({\rm s})}_{\rm max}\tau_{{\rm f}}/\ell_{{\rm f}}=1$, $\omega^{({\rm s})}_{\rm max}\tau_{{\rm f}}=5$, $\lambda=1$, and $\mathop{\mathrm{Ku}}\nolimits=0.1$.
Figure 4: Navigation based on derivatives of squared strain. ( a-- d) Optimal strategy for choosing ( a, b) $v^{({\rm s})}$ and ( c, d) $\omega^{({\rm s})}$ based on the signals ( a, c) $X_\parallel$ and ( b, d) $X_\perp$ for different prediction horizons $T_{\rm p}$ (solid lines). Data is obtained by evaluating the time average of Eqs. (\ref{['eq:TrSSqrConditionalX']}) and (\ref{['eq:TrSSqrConditionalY']}) for a discrete set of $v^{({\rm s})}$ and $\omega^{({\rm s})}_q$, and choosing $v^{({\rm s})}_{\rm opt}$ and $\omega^{({\rm s})}_{\rm opt}$ that gives the smallest average squared strain. Thick black lines show the small-$T_{\rm p}$ limit in Eqs. (\ref{['eq:optimal_policy_X']}) and (\ref{['eq:optimal_policy_Y']}). ( e) Same as Fig. \ref{['fig:GaussConditionalStrain']}( b) for the signals $X_\parallel={\hat{\boldsymbol{n}}}\cdot \boldsymbol{\nabla}\hbox{tr}(\mathbb{S}^2)\tau_{\eta}^2\ell_{{\rm f}}$ and $X_\perp={\hat{\boldsymbol{p}}}\cdot \boldsymbol{\nabla}\hbox{tr}(\mathbb{S}^2)\tau_{\eta}^2\ell_{{\rm f}}$, i.e. showing theory (solid lines), results for uniform initial positions (hollow markers) and by sequential update on time scale $T_{\rm u}=T_{\rm p}$ (filled markers). ( f) Average squared strain following the strategy that optimizes the time average Eq. (\ref{['eq:TimeAverageStrain']}) based on Eq. (\ref{['eq:TrSSqrConditionalGeneral']}) for general values of initial strain and strain gradients against different time scales $T$. The solid line shows results for Gaussian distributed initial flow components with prediction horizon $T_{\rm p}=T$. Hollow markers and filled markers with $T_{\rm p}=T_{\rm u}=T$ same as in panel ( e). Also shown are results with small $T_{\rm p}$ ($T_{\rm p}=0.01\tau_{{\rm f}}$, $T_{\rm u}=T$) and small $T_{\rm u}$ ($T_{\rm u}=0.01\tau_{{\rm f}}$, $T_{\rm p}=T$). Parameters $v^{({\rm s})}_{\rm max}\tau_{{\rm f}}/\ell_{{\rm f}}=1$, $\omega^{({\rm s})}_{\rm max}\tau_{{\rm f}}=5$, $\lambda=1$, and $\mathop{\mathrm{Ku}}\nolimits=0.1$.
Figure 5: Distribution of ${{\mathcal{S}}^2}$ in the statistical model for swimmers following Eq. (\ref{['eq:optimal_policy_TrSSqr']}) with different thresholds $\sigma_{\rm c}^2=0$, $0.25$, $0.5$, and $0.75$ (green curves). The distribution for tracer particles is shown as black dashed, $\star$. Same parameters as in Fig. \ref{['fig:RL_performance']}( a) and $\Phi_\eta=30$.
...and 3 more figures

Short term vs. long term: optimization of microswimmer navigation on different time horizons

TL;DR

Abstract

Short term vs. long term: optimization of microswimmer navigation on different time horizons

Authors

TL;DR

Abstract

Table of Contents

Figures (8)