Neural Network Compression for Reinforcement Learning Tasks
Dmitry A. Ivanov, Denis A. Larionov, Oleg V. Maslennikov, Vladimir V. Voevodin
TL;DR
This work demonstrates that combining pruning and quantization can drastically reduce RL neural network sizes—with up to 400x total compression—while preserving or slightly improving performance across diverse environments and algorithms. By detailing a gradual pruning schedule, quantization-aware training, and layer-specific quantization strategies, the authors show that both SAC and DQN can operate efficiently on resource-constrained hardware. The results suggest large redundancy in RL networks and motivate co-design of algorithms and hardware to enable edge AI and real-time control. Overall, the study provides a practical pathway to deploy RL in embedded settings without sacrificing efficacy.
Abstract
In real applications of Reinforcement Learning (RL), such as robotics, low latency and energy efficient inference is very desired. The use of sparsity and pruning for optimizing Neural Network inference, and particularly to improve energy and latency efficiency, is a standard technique. In this work, we perform a systematic investigation of applying these optimization techniques for different RL algorithms in different RL environments, yielding up to a 400-fold reduction in the size of neural networks.
