Table of Contents
Fetching ...

Learning to reset in target search problems

Gorka Muñoz-Gil, Hans J. Briegel, Michele Caraglio

TL;DR

The paper tackles target search under resetting by introducing a reinforcement-learning framework that enables agents to learn when to reset and how to move. In 1D Brownian search, agents reproduce the optimal sharp resetting interval $\tau^*$, while in 2D they discover joint reset-turn strategies that outperform baselines, including exponential resetting. The work provides a scalable, interpretable approach that not only optimizes search efficiency but also reveals new strategies (Turn-Reset) that adapt to geometry. By bridging ML optimization with stochastic search theory, the framework offers practical guidance for designing adaptive search policies in uncertain environments.

Abstract

Target search problems are central to a wide range of fields, from biological foraging to the optimization algorithms. Recently, the ability to reset the search has been shown to significantly improve the searcher's efficiency. However, the optimal resetting strategy depends on the specific properties of the search problem and can often be challenging to determine. In this work, we propose a reinforcement learning (RL)-based framework to train agents capable of optimizing their search efficiency in environments by learning how to reset. First, we validate the approach in a well-established benchmark: the Brownian search with resetting. There, RL agents consistently recover strategies closely resembling the sharp resetting distribution, known to be optimal in this scenario. We then extend the framework by allowing agents to control not only when to reset, but also their spatial dynamics through turning actions. In this more complex setting, the agents discover strategies that adapt both resetting and turning to the properties of the environment, outperforming the proposed benchmarks. These results demonstrate how reinforcement learning can serve both as an optimization tool and a mechanism for uncovering new, interpretable strategies in stochastic search processes with resetting.

Learning to reset in target search problems

TL;DR

The paper tackles target search under resetting by introducing a reinforcement-learning framework that enables agents to learn when to reset and how to move. In 1D Brownian search, agents reproduce the optimal sharp resetting interval , while in 2D they discover joint reset-turn strategies that outperform baselines, including exponential resetting. The work provides a scalable, interpretable approach that not only optimizes search efficiency but also reveals new strategies (Turn-Reset) that adapt to geometry. By bridging ML optimization with stochastic search theory, the framework offers practical guidance for designing adaptive search policies in uncertain environments.

Abstract

Target search problems are central to a wide range of fields, from biological foraging to the optimization algorithms. Recently, the ability to reset the search has been shown to significantly improve the searcher's efficiency. However, the optimal resetting strategy depends on the specific properties of the search problem and can often be challenging to determine. In this work, we propose a reinforcement learning (RL)-based framework to train agents capable of optimizing their search efficiency in environments by learning how to reset. First, we validate the approach in a well-established benchmark: the Brownian search with resetting. There, RL agents consistently recover strategies closely resembling the sharp resetting distribution, known to be optimal in this scenario. We then extend the framework by allowing agents to control not only when to reset, but also their spatial dynamics through turning actions. In this more complex setting, the agents discover strategies that adapt both resetting and turning to the properties of the environment, outperforming the proposed benchmarks. These results demonstrate how reinforcement learning can serve both as an optimization tool and a mechanism for uncovering new, interpretable strategies in stochastic search processes with resetting.

Paper Structure

This paper contains 11 sections, 5 figures.

Figures (5)

  • Figure 1: RL formulation of the target search problem with resetting.a) In 1D environments, the agent has two actions: diffuse (squares), where steps are sampled from a normal distribution with diffusion coefficient $D$, and reset (circles), which relocates the agent to the origin ($x=0$). A reward ($R=1$) is given upon reaching $x\geq L$. b) In 2D environments, we consider two agents: a reset agent (as in 1D) and a turn-reset agent, equipped with three actions: continue in the same direction, performing a step of constant length $d$ (triangles), turn by a random angle (blue circles) and reset (yellow circles). The target is positioned at distance $L$ from the origin and has radius $\rho$. c) The RL agent, exemplified here by a turn-reset agent, selects its next action $a$ —either continue (c), turn (t), or reset (r)— based on its policy $\pi$ and current the environment state $s$, defined in this case by two counters: steps since the last turn ($c_t$) and reset ($c_r$). See the main text for details.
  • Figure 2: Learned efficiency by the Reset agents Overview of the efficiencies of the learning agents in 1D (panels a,b,c and d) and 2D (panels e and f) environments. a) Efficiency $\eta$ divided by the episode length $T$, as a function of the episode number, averaged over 190 agents. Each episode consists of $T = 5\cdot 10^3$ steps. b) Final efficiencies, divided by the sharp resetting efficiency $\eta_{\mathrm{sharp}}$, as a function of the target distance to the origin $L$. Each line corresponds to: average over 190 agents (solid); best agent (dashed); exponential resetting rate (dot-dashed); best agent with a policy guided with expert knowledge (dotted). c,d) Distribution of efficiencies for the 190 learning agents for $L=5$ and $10$, divided by the sharp resetting efficiency $\eta_{\mathrm{sharp}}$. e) Final efficiencies of the learning agents in the 2D environment , divided by the sharp resetting efficiency $\eta_{\mathrm{sharp}}$, as a function of the target distance $L$ divided by the target radius $\rho$. We consider here $\rho = 1$. f) Relative fluctuation of the efficiency at different $L$ for the sharp resetting strategy (solid) and the best learned strategy (triangles).
  • Figure 3: Learned policies and resetting distributions of Reset agents in 1D environmentsa) Final learned policies, averaged over the 20 most efficient agents, for different target distances $L$ (blue shades). The gray area highlights the part of the policy set to zero to create the expert-guided strategies. b) Resulting resetting distribution calculated from the previous policies. Vertical lines show the optimal reset of the sharp resetting strategy. The gray area shows the same as above. c) Mean resetting time for the learned and expert-guided strategies, calculated from the previous $P_{\mathrm{reset}}(\tau)$ distribution. Solid line shows the optimal reset for the sharp resetting strategy.
  • Figure 4: Learned efficiency by Turn-reset agentsa) Efficiency of the learning agents, normalized by the efficiency of the sharp strategy. Solid line: average over 80 agents. Dashed line: best agent at each $L$. b) Efficiency at finer resolution of $L$normalized by the best sharp strategy for each $L$: the sharp strategy with $n^* = \lfloor L \rfloor+1$ (solid red) for $L < 6.7$ and $n^* = \lfloor L \rfloor + 2$ (dashed green) for larger $L$. Blue circles show the best agent out of 80 trained for each $L$.
  • Figure 5: Learned policies by turn-reset agentsa) Every column shows the probability of performing the action continue (blue, upper panels), turn (red, center) and reset (green, lower) as a function of the current turn and reset counters ($c_t$ and $c_r$ respectively). We show here the policy of the best agent at each distance $L$ (see \ref{['fig:turn_reset_efficiencies']}). Dashed lines indicate the optimal turn and reset counters for the sharp baseline (horizontal and vertical, respectively). The horizontal dotted line marks the learned resetting time. b) Turn counter $c_t$ at which the initial turn is completed for the learning agents (blue) and the sharp strategy (red). c) Reset counter $c_r$ where the resetting action has the largest probability for the learning agents (blue). The red line shows the optimal $\tau^*$ for the sharp strategy and the black dashed line highlights a $2L$ scaling.