Table of Contents
Fetching ...

Homing through Reinforcement Learning

Riya Singh, Pratikshya Jena, Anish Kumar, Shradha Mishra

TL;DR

This work models homing in a 2D circular arena using a Q-learning RL agent whose state is the angular deviation from home and whose actions are deterministic alignment or stochastic reorientation. The agent minimizes a radial-based cost, with a radially dependent angular threshold $ ightarrow$ motion toward home and a threshold-driven state update, yielding a nonmonotonic dependence of mean homing time on rotational diffusion $D_r$ and an optimal $D_r^{*}$. Extending to two and multiple agents with short-range repulsion reveals asymmetry among agents and a collective gain for the fastest agent as group size grows; importantly, RL trajectories are shorter and less noisy than ABP trajectories, illustrating the advantage of learning-based navigation. The results connect individual and collective homing with adaptive decision-making, noise, and interactions, offering insights for biological navigation and coordinated robotics.

Abstract

Homing and navigation are fundamental behaviors in biological systems that enable agents to reliably reach a target under uncertainty. We present a Reinforcement Learning (RL) framework to model adaptive homing in continuous two-dimensional domain. In this framework, the agent's state is given by its angular deviation from home, actions correspond to alignment or stochastic reorientation, and learning is driven by a radial-distance-based cost that penalizes motion away from the target. For a single self-propelled agent moving with constant speed, we find that the mean homing time $\langle T_{\mathrm{home}} \rangle$ exhibits a non-monotonic dependence on the rotational diffusion strength $D_r$, with an optimal noise level $D_r^{*}$, revealing a subtle interplay between exploration and goal-directed correction. Extending to two agents with soft repulsion, one agent consistently reaches home faster than the other, while in multi-agents system, repulsion ensures separation and the fastest agent becomes progressively faster as group size increases. Finally comparing the mean homing time $\langle T_{\mathrm{home}} \rangle$ of an Active Brownian Particle (ABP) and RL agent over an identical range of $D_r$, we find that RL trajectories are shorter, less noisy, and consistently faster. Our results show that cost-driven learning, stochastic reorientation, and inter-agent interactions enable efficient adaptive navigation, linking individual and collective homing. This reinforcement learning framework captures key biological features such as feedback-based route learning, randomness to escape unfavorable orientations, and mutual coordination.

Homing through Reinforcement Learning

TL;DR

This work models homing in a 2D circular arena using a Q-learning RL agent whose state is the angular deviation from home and whose actions are deterministic alignment or stochastic reorientation. The agent minimizes a radial-based cost, with a radially dependent angular threshold motion toward home and a threshold-driven state update, yielding a nonmonotonic dependence of mean homing time on rotational diffusion and an optimal . Extending to two and multiple agents with short-range repulsion reveals asymmetry among agents and a collective gain for the fastest agent as group size grows; importantly, RL trajectories are shorter and less noisy than ABP trajectories, illustrating the advantage of learning-based navigation. The results connect individual and collective homing with adaptive decision-making, noise, and interactions, offering insights for biological navigation and coordinated robotics.

Abstract

Homing and navigation are fundamental behaviors in biological systems that enable agents to reliably reach a target under uncertainty. We present a Reinforcement Learning (RL) framework to model adaptive homing in continuous two-dimensional domain. In this framework, the agent's state is given by its angular deviation from home, actions correspond to alignment or stochastic reorientation, and learning is driven by a radial-distance-based cost that penalizes motion away from the target. For a single self-propelled agent moving with constant speed, we find that the mean homing time exhibits a non-monotonic dependence on the rotational diffusion strength , with an optimal noise level , revealing a subtle interplay between exploration and goal-directed correction. Extending to two agents with soft repulsion, one agent consistently reaches home faster than the other, while in multi-agents system, repulsion ensures separation and the fastest agent becomes progressively faster as group size increases. Finally comparing the mean homing time of an Active Brownian Particle (ABP) and RL agent over an identical range of , we find that RL trajectories are shorter, less noisy, and consistently faster. Our results show that cost-driven learning, stochastic reorientation, and inter-agent interactions enable efficient adaptive navigation, linking individual and collective homing. This reinforcement learning framework captures key biological features such as feedback-based route learning, randomness to escape unfavorable orientations, and mutual coordination.
Paper Structure (16 sections, 14 equations, 13 figures)

This paper contains 16 sections, 14 equations, 13 figures.

Figures (13)

  • Figure 1: (a) (Color online) Schematic representation of the homing model showing the circular domain of radius $R_0$. The agent (indicated by filled blue circle) starts from an initial position at $(x_0, y_0)$ in the $xy$-plane, with its initial orientation $\theta(0)$ (shown by purple arrowed dashed line) measured from homing direction taken as reference axis (shown by red arrowed dotted line) from agent towards home (marked by red filled circle at center of domain). (b) Shows the agent’s instantaneous orientation as measured from home direction as denoted by $\theta(t)$. (c) The angular threshold $\phi$ is shown at three representative positions — (1) the initial point, (2) midway along the trajectory, and (3) near the home region corresponding to same three positions marked on plot. The solid green line represents the evolution of the threshold angle $\phi$ about the home direction, which starts at a large value, decreases to a minimum as the agent aligns towards home, and slightly increases again upon approaching the home location as can be seen in plot also which shows the variation of $\phi(r)$ with distance from the home location. The function is evaluated starting from the initial position $(19, 20)$, corresponding to an initial distance, $r_0 = 27.6$ from home, and continues up to the home region at $r = 2.0$. Thus, the angular threshold gradually decreases as the agent approaches the home position, enforcing greater directional precision in the vicinity of the home while maintaining sufficient freedom for exploration near home.
  • Figure 2: (Color online) Flowchart illustrating the reinforcement learning (RL) framework used for homing. The algorithm starts from the current state and at each iteration, an action is selected using an $\varepsilon$-greedy exploration--exploitation policy, followed by updates of the agent’s orientation and position. The cost is then evaluated based on the distance from the home location, and the Q-values are updated accordingly. A check is performed to determine whether the agent has reached the home; if not, the procedure is repeated from the updated state, otherwise the algorithm terminates.
  • Figure 3: (Color online) Mean homing time, $\langle T_{\text{home}} \rangle$$vs.$ rotational diffusion strength, $D_r$ for $\varepsilon = 0.3$ and $\alpha = 0.001$, averaged over 1500 independent realizations. The mean homing time, $\langle T_{\text{home}} \rangle$ initially increases with $D_{r}$, then becomes nearly constant over an intermediate range, and finally beyond an optimal value $D_r^\ast \sim$ 12 (blue), the homing time begins to decrease as $D_r$ is further increased. The symbols show the data points and the dashed line is a guide to the eye while error bars indicate the corresponding standard deviations in $\langle T_{\text{home}} \rangle$.
  • Figure 4: (a) (Color online) Mean number of resettings $\langle n\rangle$$vs.$$D_r$, averaged over 1500 independent realizations. (b) Frequency of resetting $\nu = \langle n \rangle / \langle T_{\mathrm{home}} \rangle$$vs.$$D_r$ (c) Mean homing time $\langle T_{\mathrm{home}} \rangle$ vs reset frequency $\nu$. (d) Probability distribution $P(\tau)$ of time intervals between consecutive resets. (e) The characteristic time $\tau^{*}$ as a function of $D_r$ shows a power-law dependence with two distinct scaling regimes. For $D_r \leq D_r^{*}$, $\tau^{*} \sim D_r^{-\alpha}$ with $\alpha = 0.5$, while for $D_r > D_r^{*}$, $\tau^{*} \sim D_r^{-\alpha}$ with $\alpha = 1$. The inset shows the collapsed plot of panel (d) using the scaling function $f(x)$ vs. $x$ for $D_r \leq D_r^{*}$ and $D_r > D_r^{*}$. (f) Shows the effect of rotational noise $D_r$ on action-selection statistics where black circles and red squares correspond to action 1 and action 2 respectively. Increasing noise drives a crossover in the learned policy, characterized by an enhanced preference for action 1 and a systematic reduction in the selection of action 2. Error bars show the standard deviation in respective quantities in each plot.
  • Figure 5: (a) (Color online) Plot shows $\langle T_{\mathrm{home}} \rangle$$vs.$$D_r$ where faster and slower corresponds to faster and slower agents in two-agents system and SP stands for single particle case. (b) Mean number of resetting events $\langle n \rangle$ plotted with respect to $D_r$ where symbols have same meaning as in (a) panel. Error bars show the standard deviation in respective quantities in each plot.
  • ...and 8 more figures