Noise-based reward-modulated learning

Jesús García Fernández; Nasir Ahmad; Marcel van Gerven

Noise-based reward-modulated learning

Jesús García Fernández, Nasir Ahmad, Marcel van Gerven

TL;DR

NRL addresses learning on neuromorphic hardware by unifying reinforcement learning with gradient-based optimization through noise-driven, locally available updates. It builds on reward prediction errors $\delta_t = r_t - \bar{r}_t$, eligibility traces, and node-level noise to approximate gradients without backpropagation, enabling learning with delayed rewards via two forward passes (noisy and clean) and, when necessary, multiple noisy passes. Empirical results show NR L achieves final performance close tobackpropagation baselines on control tasks with instantaneous and delayed rewards, while markedly outperforming reward-modulated Hebbian learning in deeper networks; NR L also demonstrates scalability and resilience to hardware-noise constraints. These findings suggest a viable, energy-efficient learning paradigm for edge-oriented neuromorphic AI, transforming substrate noise into a computational resource for credit assignment and adaptation.

Abstract

The pursuit of energy-efficient and adaptive artificial intelligence (AI) has positioned neuromorphic computing as a promising alternative to conventional computing. However, achieving learning on these platforms requires techniques that prioritize local information while enabling effective credit assignment. Here, we propose noise-based reward-modulated learning (NRL), a novel synaptic plasticity rule that mathematically unifies reinforcement learning and gradient-based optimization with biologically-inspired local updates. NRL addresses the computational bottleneck of exact gradients by approximating them through stochastic neural activity, transforming the inherent noise of biological and neuromorphic substrates into a functional resource. Drawing inspiration from biological learning, our method uses reward prediction errors as its optimization target to generate increasingly advantageous behavior, and eligibility traces to facilitate retrospective credit assignment. Experimental validation on reinforcement tasks, featuring immediate and delayed rewards, shows that NRL achieves performance comparable to baselines optimized using backpropagation, although with slower convergence, while showing significantly superior performance and scalability in multi-layer networks compared to reward-modulated Hebbian learning (RMHL), the most prominent similar approach. While tested on simple architectures, the results highlight the potential of noise-driven, brain-inspired learning for low-power adaptive systems, particularly in computing substrates with locality constraints. NRL offers a theoretically grounded paradigm well-suited for the event-driven characteristics of next-generation neuromorphic AI.

Noise-based reward-modulated learning

TL;DR

Abstract

Noise-based reward-modulated learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (6)