Table of Contents
Fetching ...

Optimal foraging strategies can be learned

Gorka Muñoz-Gil, Andrea López-Incera, Lukas J. Fiderer, Hans J. Briegel

TL;DR

This work proves theoretically that maximizing rewards in the RL model is equivalent to optimizing foraging efficiency and shows with numerical experiments that, in the paradigmatic model of non-destructive search, agents learn foraging strategies which outperform the efficiency of some of the best known strategies such as Lévy walks.

Abstract

The foraging behavior of animals is a paradigm of target search in nature. Understanding which foraging strategies are optimal and how animals learn them are central challenges in modeling animal foraging. While the question of optimality has wide-ranging implications across fields such as economy, physics, and ecology, the question of learnability is a topic of ongoing debate in evolutionary biology. Recognizing the interconnected nature of these challenges, this work addresses them simultaneously by exploring optimal foraging strategies through a reinforcement learning framework. To this end, we model foragers as learning agents. We first prove theoretically that maximizing rewards in our reinforcement learning model is equivalent to optimizing foraging efficiency. We then show with numerical experiments that, in the paradigmatic model of non-destructive search, our agents learn foraging strategies which outperform the efficiency of some of the best known strategies such as Lévy walks. These findings highlight the potential of reinforcement learning as a versatile framework not only for optimizing search strategies but also to model the learning process, thus shedding light on the role of learning in natural optimization processes.

Optimal foraging strategies can be learned

TL;DR

This work proves theoretically that maximizing rewards in the RL model is equivalent to optimizing foraging efficiency and shows with numerical experiments that, in the paradigmatic model of non-destructive search, agents learn foraging strategies which outperform the efficiency of some of the best known strategies such as Lévy walks.

Abstract

The foraging behavior of animals is a paradigm of target search in nature. Understanding which foraging strategies are optimal and how animals learn them are central challenges in modeling animal foraging. While the question of optimality has wide-ranging implications across fields such as economy, physics, and ecology, the question of learnability is a topic of ongoing debate in evolutionary biology. Recognizing the interconnected nature of these challenges, this work addresses them simultaneously by exploring optimal foraging strategies through a reinforcement learning framework. To this end, we model foragers as learning agents. We first prove theoretically that maximizing rewards in our reinforcement learning model is equivalent to optimizing foraging efficiency. We then show with numerical experiments that, in the paradigmatic model of non-destructive search, our agents learn foraging strategies which outperform the efficiency of some of the best known strategies such as Lévy walks. These findings highlight the potential of reinforcement learning as a versatile framework not only for optimizing search strategies but also to model the learning process, thus shedding light on the role of learning in natural optimization processes.
Paper Structure (6 sections, 10 equations, 9 figures, 1 table)

This paper contains 6 sections, 10 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Schematic illustration of random-walk based foraging strategies. According to the hypothesis of learning-based movement biological entities can adapt and learn in order to optimize their search efficiency (i.e. the number of targets collected per time) lewis2021learning. Different animals (as e.g. albatrosses viswanathan1996levyde2017early, bison sigaud2017collective, bumblebees lihoreau2012radarleadbeater2009bumble, deer focardi2009adaptive or bats vilk2022phase) may learn different strategies, based on their cognitive capacities, surrounding environment or biological pressure wosniack2017evolutionary.
  • Figure 2: The problem of non-destructive foraging is formulated within the framework of RL. An agent moves through an environment with randomly distributed targets and, at each step, chooses between two possible actions: continue in the same direction ($\uparrow$) or turn ($\Rsh$) in a random direction. The state perceived by the agent is a counter $n$, which is the number of small steps of length $d$ which compose the current step of length $L$. Whenever the agent detects a target, it receives a reward $R$ and resumes its walk at a distance $l_\mathrm{c}$ from the detected target with the counter reset.
  • Figure 3: Learning curves and the advantage of learned policies over benchmarks. (a) The search efficiency (averaged over 10 agents, displayed with one standard deviation) is shown over the course of learning (measured in training episodes). Different colors correspond to environments with different cutoff length $l_\mathrm{c}$. Efficiencies are normalized by the respective best benchmark efficiency, which turns out to be bi-exponential distributions in all cases. Dashed lines show the efficiency of the best Lévy walk for each case. (b) Comparison between the best agent's search efficiency at the end of learning and that of the best benchmarks, for each environment. The efficiency of the best Lévy walk for $l_\mathrm{c}=1$ is $\eta_{\textrm{L\'evy}} / \eta_\textrm{bi-exp.} = 0.88$. For each agent and benchmark model, the efficiency is averaged over $2\cdot10^8$ RL steps. In panel b), the standard error of the mean for the benchmark models and the best agents is depicted but too small to be visible.
  • Figure 4: Policy of an RL agent trained in an environment with cutoff length $l_\mathrm{c}=0.6$. The policies corresponding to the best bi-exponential (solid line, $d_1=0.15, \omega_1=0.96, d_2=13047.89$) and the best Lévy distributions (dashed line, $\beta=0.64$) are shown for comparison. The normalized search efficiencies are $\eta_{\textrm{RL}}/\eta_{\textrm{bi-exp}} = 1.02$ and $\eta_{\textrm{L\'evy}}/\eta_{\textrm{bi-exp}}=0.85$. The grey dotted line marks the initialization policy $\pi_0(\Rsh|n)=0.01$$\forall n$.
  • Figure 5: Analysis of the learned policies for different cutoff lengths. (a) Learned policies $\pi(\Rsh|n)$ as a function of the counter $n$. Each point is the average over 10 agents and the shaded area represents one standard deviation. The inset shows, on the left axis, the turning probability at $n=l_\mathrm{c}$ averaged over 10 agents, error bars represent one standard deviation. On the right axis, the probability $p_\Rsh$ of hitting the target when turning at $n=l_\mathrm{c}$ is shown (see \ref{['app:geometry']}). (b) Step-length distributions corresponding to the policies presented in a). Each point is the median over 10 agents.
  • ...and 4 more figures