Learning To Play Atari Games Using Dueling Q-Learning and Hebbian Plasticity
Md Ashfaq Salehin
TL;DR
This work investigates learning Atari games from raw pixels using advanced deep Q-learning variants, notably Double DQN and Dueling DQN, and extends them with differentiable Hebbian plasticity to enable lifelong learning. By coupling a fixed-weight network with trainable plastic components, the study demonstrates improved stability and performance during plastic training phases, while acknowledging that the approach lags behind DeepMind's state-of-the-art DQN due to infrastructure and replay enhancements. The methodology includes a CNN feature extractor, separate value/advantage streams, and a Hebbian trace-based plasticity mechanism that updates during learning; results show higher, more stable rewards during plastic phases and potential for continued adaptation without catastrophic forgetting. The findings suggest differentiable plasticity as a promising direction for reinforcement learning, warranting further exploration with prioritized replay, longer training, and ensemble strategies to close the gap with top-performing systems. Overall, the paper contributes a practical platform for comparing standard and plasticity-enhanced RL agents on Atari benchmarks and highlights lifelong learning as a key benefit of plasticity in deep RL.
Abstract
In this work, an advanced deep reinforcement learning architecture is used to train neural network agents playing atari games. Given only the raw game pixels, action space, and reward information, the system can train agents to play any Atari game. At first, this system uses advanced techniques like deep Q-networks and dueling Q-networks to train efficient agents, the same techniques used by DeepMind to train agents that beat human players in Atari games. As an extension, plastic neural networks are used as agents, and their feasibility is analyzed in this scenario. The plasticity implementation was based on backpropagation and the Hebbian update rule. Plastic neural networks have excellent features like lifelong learning after the initial training, which makes them highly suitable in adaptive learning environments. As a new analysis of plasticity in this context, this work might provide valuable insights and direction for future works.
