Table of Contents
Fetching ...

Noise-based reward-modulated learning

Jesús García Fernández, Nasir Ahmad, Marcel van Gerven

TL;DR

NRL addresses learning on neuromorphic hardware by unifying reinforcement learning with gradient-based optimization through noise-driven, locally available updates. It builds on reward prediction errors $\delta_t = r_t - \bar{r}_t$, eligibility traces, and node-level noise to approximate gradients without backpropagation, enabling learning with delayed rewards via two forward passes (noisy and clean) and, when necessary, multiple noisy passes. Empirical results show NR L achieves final performance close tobackpropagation baselines on control tasks with instantaneous and delayed rewards, while markedly outperforming reward-modulated Hebbian learning in deeper networks; NR L also demonstrates scalability and resilience to hardware-noise constraints. These findings suggest a viable, energy-efficient learning paradigm for edge-oriented neuromorphic AI, transforming substrate noise into a computational resource for credit assignment and adaptation.

Abstract

The pursuit of energy-efficient and adaptive artificial intelligence (AI) has positioned neuromorphic computing as a promising alternative to conventional computing. However, achieving learning on these platforms requires techniques that prioritize local information while enabling effective credit assignment. Here, we propose noise-based reward-modulated learning (NRL), a novel synaptic plasticity rule that mathematically unifies reinforcement learning and gradient-based optimization with biologically-inspired local updates. NRL addresses the computational bottleneck of exact gradients by approximating them through stochastic neural activity, transforming the inherent noise of biological and neuromorphic substrates into a functional resource. Drawing inspiration from biological learning, our method uses reward prediction errors as its optimization target to generate increasingly advantageous behavior, and eligibility traces to facilitate retrospective credit assignment. Experimental validation on reinforcement tasks, featuring immediate and delayed rewards, shows that NRL achieves performance comparable to baselines optimized using backpropagation, although with slower convergence, while showing significantly superior performance and scalability in multi-layer networks compared to reward-modulated Hebbian learning (RMHL), the most prominent similar approach. While tested on simple architectures, the results highlight the potential of noise-driven, brain-inspired learning for low-power adaptive systems, particularly in computing substrates with locality constraints. NRL offers a theoretically grounded paradigm well-suited for the event-driven characteristics of next-generation neuromorphic AI.

Noise-based reward-modulated learning

TL;DR

NRL addresses learning on neuromorphic hardware by unifying reinforcement learning with gradient-based optimization through noise-driven, locally available updates. It builds on reward prediction errors , eligibility traces, and node-level noise to approximate gradients without backpropagation, enabling learning with delayed rewards via two forward passes (noisy and clean) and, when necessary, multiple noisy passes. Empirical results show NR L achieves final performance close tobackpropagation baselines on control tasks with instantaneous and delayed rewards, while markedly outperforming reward-modulated Hebbian learning in deeper networks; NR L also demonstrates scalability and resilience to hardware-noise constraints. These findings suggest a viable, energy-efficient learning paradigm for edge-oriented neuromorphic AI, transforming substrate noise into a computational resource for credit assignment and adaptation.

Abstract

The pursuit of energy-efficient and adaptive artificial intelligence (AI) has positioned neuromorphic computing as a promising alternative to conventional computing. However, achieving learning on these platforms requires techniques that prioritize local information while enabling effective credit assignment. Here, we propose noise-based reward-modulated learning (NRL), a novel synaptic plasticity rule that mathematically unifies reinforcement learning and gradient-based optimization with biologically-inspired local updates. NRL addresses the computational bottleneck of exact gradients by approximating them through stochastic neural activity, transforming the inherent noise of biological and neuromorphic substrates into a functional resource. Drawing inspiration from biological learning, our method uses reward prediction errors as its optimization target to generate increasingly advantageous behavior, and eligibility traces to facilitate retrospective credit assignment. Experimental validation on reinforcement tasks, featuring immediate and delayed rewards, shows that NRL achieves performance comparable to baselines optimized using backpropagation, although with slower convergence, while showing significantly superior performance and scalability in multi-layer networks compared to reward-modulated Hebbian learning (RMHL), the most prominent similar approach. While tested on simple architectures, the results highlight the potential of noise-driven, brain-inspired learning for low-power adaptive systems, particularly in computing substrates with locality constraints. NRL offers a theoretically grounded paradigm well-suited for the event-driven characteristics of next-generation neuromorphic AI.

Paper Structure

This paper contains 17 sections, 2 theorems, 35 equations, 4 figures, 2 tables.

Key Result

Theorem A.3

Let $\epsilon \sim \mathcal{N}(0, \sigma^2 I_n)$ (probability distribution $p(\epsilon)$) where $n$ is the number of dimensions in $\theta$. Exact gradients can be written in terms of directional derivatives using expectations

Figures (4)

  • Figure 1: Performance on benchmarks. A, B, C: Reaching problem. D, E, F: Acrobot problem. G, H, I: Cartpole problem. Left panels: Performance across trials averaged over 5 runs. Centre panels: Final performance (mean of the last 50 trials), averaged over 5 runs. Right panels: Problem visualization.
  • Figure 2: Performance on deeper networks for the Acrobot. A, B: 2-hidden layer networks. C, D: 3-hidden layer networks. Left panels: Performance across trials averaged over 5 runs. Right panels: Final performance (mean of the last 50 trials), averaged over 5 runs.
  • Figure 3: Performance on deeper networks for the Cartpole. A, B: 2-hidden layer networks. C, D: 3-hidden layer networks. Left panels: Performance across trials averaged over 5 runs. Right panels: Final performance (mean of the last 50 trials), averaged over 5 runs.
  • Figure 4: Learning using only noisy passes for the Acrobot. A: Clean pass approximation error. Each data point is computed with an absolute error, averaged over 500 timesteps using the Acrobot problem. B: Acrobot problem. Performance across trials, averaged over 5 runs, with mean, minimum, and maximum values displayed.

Theorems & Definitions (6)

  • Definition A.1
  • Definition A.2
  • Theorem A.3
  • proof
  • Theorem A.4
  • proof