Q-learning with temporal memory to navigate turbulence
Marco Rando, Martin James, Alessandro Verri, Lorenzo Rosasco, Agnese Seminara
TL;DR
This work tackles the problem of locating an odor source in a turbulent plume using only odor cues, with no spatial perception or prior information. It introduces a map-free reinforcement learning approach that uses a small, interpretable olfactory state space derived from a temporal memory window over odor traces, discretizing intermittency $\bar{i}$ and intensity $\bar{c}$ to guide actions via tabular Q-learning. A key finding is the existence of an optimal memory length $T^*$ that balances ignoring plume blanks against triggering recovery outside the plume; a learned recovery policy and adaptive memory $T=\tau_b^-$ yield cast-and-surge-like behavior and robust generalization across multiple environments. The results suggest that temporal features of odor signals can robustly drive navigation in highly intermittent turbulence, with implications for understanding insect behavior and for designing odor-guided autonomous systems.
Abstract
We consider the problem of olfactory searches in a turbulent environment. We focus on agents that respond solely to odor stimuli, with no access to spatial perception nor prior information about the odor. We ask whether navigation to a target can be learned robustly within a sequential decision making framework. We develop a reinforcement learning algorithm using a small set of interpretable olfactory states and train it with realistic turbulent odor cues. By introducing a temporal memory, we demonstrate that two salient features of odor traces, discretized in few olfactory states, are sufficient to learn navigation in a realistic odor plume. Performance is dictated by the sparse nature of turbulent odors. An optimal memory exists which ignores blanks within the plume and activates a recovery strategy outside the plume. We obtain the best performance by letting agents learn their recovery strategy and show that it is mostly casting cross wind, similar to behavior observed in flying insects. The optimal strategy is robust to substantial changes in the odor plumes, suggesting minor parameter tuning may be sufficient to adapt to different environments.
