Table of Contents
Fetching ...

Ornstein-Uhlenbeck Adaptation as a Mechanism for Learning in Brains and Machines

Jesus Garcia Fernandez, Nasir Ahmad, Marcel van Gerven

TL;DR

Biological learning and neuromorphic hardware face challenges implementing exact gradient descent due to the need for precise gradients and non-local information. The authors propose Ornstein-Uhlenbeck adaptation (OUA), a gradient-free framework where parameter dynamics are governed by an OU process toward a mean that is updated by a global reward prediction error, enabling online exploration and exploitation in evolving environments. They validate OUA across supervised, reinforcement, recurrent, weather forecasting, and meta-learning tasks, showing robust learning and improved performance relative to baselines. The findings suggest OUA as a practical, hardware-friendly alternative to backpropagation with potential insights into noise-driven learning in the brain and scalable neuromorphic implementations.

Abstract

Learning is a fundamental property of intelligent systems, observed across biological organisms and engineered systems. While modern intelligent systems typically rely on gradient descent for learning, the need for exact gradients and complex information flow makes its implementation in biological and neuromorphic systems challenging. This has motivated the exploration of alternative learning mechanisms that can operate locally and do not rely on exact gradients. In this work, we introduce a novel approach that leverages noise in the parameters of the system and global reinforcement signals. Using an Ornstein-Uhlenbeck process with adaptive dynamics, our method balances exploration and exploitation during learning, driven by deviations from error predictions, akin to reward prediction error. Operating in continuous time, Orstein-Uhlenbeck adaptation (OUA) is proposed as a general mechanism for learning dynamic, time-evolving environments. We validate our approach across diverse tasks, including supervised learning and reinforcement learning in feedforward and recurrent systems. Additionally, we demonstrate that it can perform meta-learning, adjusting hyper-parameters autonomously. Our results indicate that OUA provides a viable alternative to traditional gradient-based methods, with potential applications in neuromorphic computing. It also hints at a possible mechanism for noise-driven learning in the brain, where stochastic neurotransmitter release may guide synaptic adjustments.

Ornstein-Uhlenbeck Adaptation as a Mechanism for Learning in Brains and Machines

TL;DR

Biological learning and neuromorphic hardware face challenges implementing exact gradient descent due to the need for precise gradients and non-local information. The authors propose Ornstein-Uhlenbeck adaptation (OUA), a gradient-free framework where parameter dynamics are governed by an OU process toward a mean that is updated by a global reward prediction error, enabling online exploration and exploitation in evolving environments. They validate OUA across supervised, reinforcement, recurrent, weather forecasting, and meta-learning tasks, showing robust learning and improved performance relative to baselines. The findings suggest OUA as a practical, hardware-friendly alternative to backpropagation with potential insights into noise-driven learning in the brain and scalable neuromorphic implementations.

Abstract

Learning is a fundamental property of intelligent systems, observed across biological organisms and engineered systems. While modern intelligent systems typically rely on gradient descent for learning, the need for exact gradients and complex information flow makes its implementation in biological and neuromorphic systems challenging. This has motivated the exploration of alternative learning mechanisms that can operate locally and do not rely on exact gradients. In this work, we introduce a novel approach that leverages noise in the parameters of the system and global reinforcement signals. Using an Ornstein-Uhlenbeck process with adaptive dynamics, our method balances exploration and exploitation during learning, driven by deviations from error predictions, akin to reward prediction error. Operating in continuous time, Orstein-Uhlenbeck adaptation (OUA) is proposed as a general mechanism for learning dynamic, time-evolving environments. We validate our approach across diverse tasks, including supervised learning and reinforcement learning in feedforward and recurrent systems. Additionally, we demonstrate that it can perform meta-learning, adjusting hyper-parameters autonomously. Our results indicate that OUA provides a viable alternative to traditional gradient-based methods, with potential applications in neuromorphic computing. It also hints at a possible mechanism for noise-driven learning in the brain, where stochastic neurotransmitter release may guide synaptic adjustments.

Paper Structure

This paper contains 14 sections, 12 equations, 8 figures.

Figures (8)

  • Figure 1: Dependency structure of the variables that together determine Ornstein-Uhlenbeck adaptation (hyper-parameters not shown). Variables $\bar{r}$, $\vb*{\mu}$ and $\vb*{\theta}$ (green) are related to learning whereas variables $\vb*{z}$ and $\vb*{y}$ (blue) are related to inference. The average reward estimate $\bar{r}$ depends on rewards $r$ (red) that indirectly depend on the outputs $\vb*{y}$ generated by the model. The output itself depends on input $\vb*{x}$ (black).
  • Figure 2: Dynamics of a single-parameter model across 15 random seeds with $\rho = \lambda = \eta = 1$ and $\sigma = 0.3$. Initial conditions are $\bar{r}_0 = -1$, $\theta_0 = 0$, and $\mu_0 = 0$. The target output is generated with a fixed target parameter $\theta^* = 1$. (a) Target output vs model output (b) Evolution of $\theta$. (c) Trajectories of $\mu$. (d) RPE $\delta_r$, shown on a logarithmic axis to better visualise initial convergence. A dotted line at 0 is added to depict convergence around this value. (e) Cumulative reward $G$ over time, showing improvement with learning compared to the untrained model (dashed).
  • Figure 3: Sensitivity of the final cumulative reward $G(T)$ to model hyper-parameters for the input-output mapping task. Results are averaged over 15 runs using different random seeds. The shaded area represents variability across runs, showing stability across a wide range of hyper-parameter settings. Vertical lines show the chosen hyper-parameter values, and horizontal lines show the return without learning. (a) Impact of $\lambda$. (b) Impact of $\sigma$. (c) Impact of $\rho$. (d) Impact of $\eta$.
  • Figure 4: Learning dynamics in a recurrent model with parameters $\vb*{\theta} = [\theta_1, \theta_2, \theta_3]$, where $\rho=\lambda=1$, $\eta=50$ and $\sigma=0.2$. Initial conditions are $\bar{r}_0 = -0.1$, $\vb*{\theta_{0}}=[0.2, 0.1, 0.5]$. The fixed target parameters used to generate the target output are $\vb*{\theta}^*=[0.3, 0.7, 1.0]$. (a) Target output vs model output (b) Evolution of the parameters $\vb*{\theta}$. (c) Trajectories of $\vb*{\mu}$. (d) RPE $\delta_r = r - \bar{r}$, shown on a logarithmic axis to better visualise initial convergence. A dotted line at 0 is added to depict convergence around this value. (e) Cumulative reward $G$ over time, showing improvement with learning compared to the untrained model (dashed). The blue line indicates the return during parameter learning. The dashed line denotes the return when $\vb*{\theta}$ are fixed to their initial values $\vb*{\theta}_0$. The orange line indicates the return obtained when we fix parameters to the final mean parameters $\vb*{\theta}(t) = \vb*{\mu}(T)$.
  • Figure 5: Learning dynamics in a model with parameters $\vb*{\theta} = (\theta_1, \ldots, \theta_6)^\top$, where $\rho = \lambda = \eta = 1$ and $\sigma = 0.2$. Initial conditions are $\bar{r}_0 = -1$ and $\vb*{\theta}_0 = \vb*{\mu}_0 = \mathbf{0}$. The fixed target parameters used to generate the target output are $\vb*{\theta}^* = (0.3, 1.1, 0.0, -0.3, -1.5, -0.4)^\top$. (a) Target output vs model output. (b) Evolution of the parameters $\vb*{\theta}$. (c) Trajectories of $\vb*{\mu}$. (d) RPE $\delta_r = r - \bar{r}$, shown on a logarithmic axis to better visualise initial convergence. A dotted line at 0 is added to depict convergence around this value. (e) Cumulative reward $G$ over time, showing improvement with learning compared to the untrained model (dashed). The blue line indicates the return during parameter learning. The dashed line denotes the return when $\vb*{\theta}$ are fixed to their initial values $\vb*{\theta}_0$. The orange line indicates the return obtained when we fix parameters to the final mean parameters $\vb*{\theta}(t) = \vb*{\mu}(T)$.
  • ...and 3 more figures