Continuous Time Continuous Space Homeostatic Reinforcement Learning (CTCS-HRRL) : Towards Biological Self-Autonomous Agent
Hugo Laurencon, Yesoda Bhargava, Riddhi Zantye, Charbel-Raphaël Ségerie, Johann Lussange, Veeky Baths, Boris Gutkin
TL;DR
The work tackles continuous-time, continuous-space homeostatic learning by extending HRRL to CTCS-HRRL, enabling self-autonomous agents to regulate internal states in dynamic environments. It derives an equivalence between maximizing the expected discounted reward $V^\pi$ and minimizing the discounted drive $J^\pi$, using the drive $d(\delta)=\sqrt{\delta^T \delta}$ and the Hamilton-Jacobi-Bellman framework: $-\log(\gamma) J^{*}(\zeta)= \min_a [ d(\zeta,u_a) + (\partial J^{*}/\partial \zeta) \cdot f(\zeta,u_a) ]$. The method combines model-based learning with neural networks to estimate the optimal deviation function $J^*$ and uses a target network for stability. A 2D resource-foraging simulation demonstrates that the agent learns policies that maintain homeostasis, showing continuous adaptation with fatigue dynamics and improved resource allocation, suggesting applicability to modeling animal decision-making and bio-inspired control in continuous domains.
Abstract
Homeostasis is a biological process by which living beings maintain their internal balance. Previous research suggests that homeostasis is a learned behaviour. Recently introduced Homeostatic Regulated Reinforcement Learning (HRRL) framework attempts to explain this learned homeostatic behavior by linking Drive Reduction Theory and Reinforcement Learning. This linkage has been proven in the discrete time-space, but not in the continuous time-space. In this work, we advance the HRRL framework to a continuous time-space environment and validate the CTCS-HRRL (Continuous Time Continuous Space HRRL) framework. We achieve this by designing a model that mimics the homeostatic mechanisms in a real-world biological agent. This model uses the Hamilton-Jacobian Bellman Equation, and function approximation based on neural networks and Reinforcement Learning. Through a simulation-based experiment we demonstrate the efficacy of this model and uncover the evidence linked to the agent's ability to dynamically choose policies that favor homeostasis in a continuously changing internal-state milieu. Results of our experiments demonstrate that agent learns homeostatic behaviour in a CTCS environment, making CTCS-HRRL a promising framework for modellng animal dynamics and decision-making.
