Table of Contents
Fetching ...

Continuous Time Continuous Space Homeostatic Reinforcement Learning (CTCS-HRRL) : Towards Biological Self-Autonomous Agent

Hugo Laurencon, Yesoda Bhargava, Riddhi Zantye, Charbel-Raphaël Ségerie, Johann Lussange, Veeky Baths, Boris Gutkin

TL;DR

The work tackles continuous-time, continuous-space homeostatic learning by extending HRRL to CTCS-HRRL, enabling self-autonomous agents to regulate internal states in dynamic environments. It derives an equivalence between maximizing the expected discounted reward $V^\pi$ and minimizing the discounted drive $J^\pi$, using the drive $d(\delta)=\sqrt{\delta^T \delta}$ and the Hamilton-Jacobi-Bellman framework: $-\log(\gamma) J^{*}(\zeta)= \min_a [ d(\zeta,u_a) + (\partial J^{*}/\partial \zeta) \cdot f(\zeta,u_a) ]$. The method combines model-based learning with neural networks to estimate the optimal deviation function $J^*$ and uses a target network for stability. A 2D resource-foraging simulation demonstrates that the agent learns policies that maintain homeostasis, showing continuous adaptation with fatigue dynamics and improved resource allocation, suggesting applicability to modeling animal decision-making and bio-inspired control in continuous domains.

Abstract

Homeostasis is a biological process by which living beings maintain their internal balance. Previous research suggests that homeostasis is a learned behaviour. Recently introduced Homeostatic Regulated Reinforcement Learning (HRRL) framework attempts to explain this learned homeostatic behavior by linking Drive Reduction Theory and Reinforcement Learning. This linkage has been proven in the discrete time-space, but not in the continuous time-space. In this work, we advance the HRRL framework to a continuous time-space environment and validate the CTCS-HRRL (Continuous Time Continuous Space HRRL) framework. We achieve this by designing a model that mimics the homeostatic mechanisms in a real-world biological agent. This model uses the Hamilton-Jacobian Bellman Equation, and function approximation based on neural networks and Reinforcement Learning. Through a simulation-based experiment we demonstrate the efficacy of this model and uncover the evidence linked to the agent's ability to dynamically choose policies that favor homeostasis in a continuously changing internal-state milieu. Results of our experiments demonstrate that agent learns homeostatic behaviour in a CTCS environment, making CTCS-HRRL a promising framework for modellng animal dynamics and decision-making.

Continuous Time Continuous Space Homeostatic Reinforcement Learning (CTCS-HRRL) : Towards Biological Self-Autonomous Agent

TL;DR

The work tackles continuous-time, continuous-space homeostatic learning by extending HRRL to CTCS-HRRL, enabling self-autonomous agents to regulate internal states in dynamic environments. It derives an equivalence between maximizing the expected discounted reward and minimizing the discounted drive , using the drive and the Hamilton-Jacobi-Bellman framework: . The method combines model-based learning with neural networks to estimate the optimal deviation function and uses a target network for stability. A 2D resource-foraging simulation demonstrates that the agent learns policies that maintain homeostasis, showing continuous adaptation with fatigue dynamics and improved resource allocation, suggesting applicability to modeling animal decision-making and bio-inspired control in continuous domains.

Abstract

Homeostasis is a biological process by which living beings maintain their internal balance. Previous research suggests that homeostasis is a learned behaviour. Recently introduced Homeostatic Regulated Reinforcement Learning (HRRL) framework attempts to explain this learned homeostatic behavior by linking Drive Reduction Theory and Reinforcement Learning. This linkage has been proven in the discrete time-space, but not in the continuous time-space. In this work, we advance the HRRL framework to a continuous time-space environment and validate the CTCS-HRRL (Continuous Time Continuous Space HRRL) framework. We achieve this by designing a model that mimics the homeostatic mechanisms in a real-world biological agent. This model uses the Hamilton-Jacobian Bellman Equation, and function approximation based on neural networks and Reinforcement Learning. Through a simulation-based experiment we demonstrate the efficacy of this model and uncover the evidence linked to the agent's ability to dynamically choose policies that favor homeostasis in a continuously changing internal-state milieu. Results of our experiments demonstrate that agent learns homeostatic behaviour in a CTCS environment, making CTCS-HRRL a promising framework for modellng animal dynamics and decision-making.
Paper Structure (18 sections, 1 theorem, 16 equations, 5 figures, 1 algorithm)

This paper contains 18 sections, 1 theorem, 16 equations, 5 figures, 1 algorithm.

Key Result

Lemma 1

The pursuit of homeostatic stability is equivalent to the maximization of the reward. Formally, we have

Figures (5)

  • Figure 1: The environment of the simulation experiment. The agent is represented by a gray point and is located by its coordinates in the plane. The colored circles indicate the two resources present in the environment that the agent has to consume. These colored circles delimit the space in which it is possible to consume a resource.
  • Figure 2: Resource consumption for the two resources in the square environment. The homeostatic set point for Resource 1 is 1 and for Resource 2 is 2, as indicated by dashed black and red lines respectively. (a) : 6000 iterations (b): 8000 iterations. (c): 10000 iterations (d):14000 iterations.
  • Figure 3: Plot showing the variation in the muscular and sleep fatigue.
  • Figure 4: Plots showing the variation in the Loss of Deviation Function ($J$). (a) : 6000 iterations (b): 8000 iterations. (c): 10000 iterations (d):14000 iterations.
  • Figure 5: Agent Learning and Exploration in an unknown environment. The figure shows agent track for the duration of iteration. (a): Beginning point (b) : Exploring the environment (c): Further exploration. (d): Learning Resource positions. (e) 6000 iteration. (f) 8000 iterations. (g) 10000 iterations. (h) 14000 iterations

Theorems & Definitions (1)

  • Lemma 1