Homing through Reinforcement Learning
Riya Singh, Pratikshya Jena, Anish Kumar, Shradha Mishra
TL;DR
This work models homing in a 2D circular arena using a Q-learning RL agent whose state is the angular deviation from home and whose actions are deterministic alignment or stochastic reorientation. The agent minimizes a radial-based cost, with a radially dependent angular threshold $ ightarrow$ motion toward home and a threshold-driven state update, yielding a nonmonotonic dependence of mean homing time on rotational diffusion $D_r$ and an optimal $D_r^{*}$. Extending to two and multiple agents with short-range repulsion reveals asymmetry among agents and a collective gain for the fastest agent as group size grows; importantly, RL trajectories are shorter and less noisy than ABP trajectories, illustrating the advantage of learning-based navigation. The results connect individual and collective homing with adaptive decision-making, noise, and interactions, offering insights for biological navigation and coordinated robotics.
Abstract
Homing and navigation are fundamental behaviors in biological systems that enable agents to reliably reach a target under uncertainty. We present a Reinforcement Learning (RL) framework to model adaptive homing in continuous two-dimensional domain. In this framework, the agent's state is given by its angular deviation from home, actions correspond to alignment or stochastic reorientation, and learning is driven by a radial-distance-based cost that penalizes motion away from the target. For a single self-propelled agent moving with constant speed, we find that the mean homing time $\langle T_{\mathrm{home}} \rangle$ exhibits a non-monotonic dependence on the rotational diffusion strength $D_r$, with an optimal noise level $D_r^{*}$, revealing a subtle interplay between exploration and goal-directed correction. Extending to two agents with soft repulsion, one agent consistently reaches home faster than the other, while in multi-agents system, repulsion ensures separation and the fastest agent becomes progressively faster as group size increases. Finally comparing the mean homing time $\langle T_{\mathrm{home}} \rangle$ of an Active Brownian Particle (ABP) and RL agent over an identical range of $D_r$, we find that RL trajectories are shorter, less noisy, and consistently faster. Our results show that cost-driven learning, stochastic reorientation, and inter-agent interactions enable efficient adaptive navigation, linking individual and collective homing. This reinforcement learning framework captures key biological features such as feedback-based route learning, randomness to escape unfavorable orientations, and mutual coordination.
