Neuromorphic dreaming: A pathway to efficient learning in artificial agents
Ingo Blakowski, Dmitrii Zendrikov, Cristiano Capone, Giacomo Indiveri
TL;DR
The paper addresses energy and data efficiency in reinforcement learning by implementing a model-based RL framework using spiking neural networks on mixed-signal neuromorphic hardware. It introduces a two-network architecture (agent and world model) and an awake-dreaming training protocol that alternates real-environment interactions with simulated rollouts to boost sample efficiency, employing local learning rules such as e-prop and a policy-gradient objective $E^A = -\sum_t R^t \log(\pi^t_k)$ with $R^t = \sum_{t' \ge t} \gamma^{t'-t} r^{t'}$. The world model is trained with supervised e-prop-based readouts, minimizing a combined state and reward prediction loss $E^M = c_\xi \sum_{t,k} (\xi^{\star t+1}_k - \xi^{t+1}_k)^2 + c_r \sum_t (r^{\star t+1} - r^{t+1})^2$, enabling accurate imagined experiences. Validation on Atari Pong demonstrates that dreaming reduces the required number of real environment interactions while maintaining or improving learning performance, and that the approach runs in real time on the DYNAP-SE neuromorphic processor with sub-milliwatt power consumption, supporting a practical path toward energy-efficient neuromorphic learning for real-world robotics and intelligent agents.
Abstract
Achieving energy efficiency in learning is a key challenge for artificial intelligence (AI) computing platforms. Biological systems demonstrate remarkable abilities to learn complex skills quickly and efficiently. Inspired by this, we present a hardware implementation of model-based reinforcement learning (MBRL) using spiking neural networks (SNNs) on mixed-signal analog/digital neuromorphic hardware. This approach leverages the energy efficiency of mixed-signal neuromorphic chips while achieving high sample efficiency through an alternation of online learning, referred to as the "awake" phase, and offline learning, known as the "dreaming" phase. The model proposed includes two symbiotic networks: an agent network that learns by combining real and simulated experiences, and a learned world model network that generates the simulated experiences. We validate the model by training the hardware implementation to play the Atari game Pong. We start from a baseline consisting of an agent network learning without a world model and dreaming, which successfully learns to play the game. By incorporating dreaming, the number of required real game experiences are reduced significantly compared to the baseline. The networks are implemented using a mixed-signal neuromorphic processor, with the readout layers trained using a computer in-the-loop, while the other layers remain fixed. These results pave the way toward energy-efficient neuromorphic learning systems capable of rapid learning in real world applications and use-cases.
