Table of Contents
Fetching ...

Neural Network Compression for Reinforcement Learning Tasks

Dmitry A. Ivanov, Denis A. Larionov, Oleg V. Maslennikov, Vladimir V. Voevodin

TL;DR

This work demonstrates that combining pruning and quantization can drastically reduce RL neural network sizes—with up to 400x total compression—while preserving or slightly improving performance across diverse environments and algorithms. By detailing a gradual pruning schedule, quantization-aware training, and layer-specific quantization strategies, the authors show that both SAC and DQN can operate efficiently on resource-constrained hardware. The results suggest large redundancy in RL networks and motivate co-design of algorithms and hardware to enable edge AI and real-time control. Overall, the study provides a practical pathway to deploy RL in embedded settings without sacrificing efficacy.

Abstract

In real applications of Reinforcement Learning (RL), such as robotics, low latency and energy efficient inference is very desired. The use of sparsity and pruning for optimizing Neural Network inference, and particularly to improve energy and latency efficiency, is a standard technique. In this work, we perform a systematic investigation of applying these optimization techniques for different RL algorithms in different RL environments, yielding up to a 400-fold reduction in the size of neural networks.

Neural Network Compression for Reinforcement Learning Tasks

TL;DR

This work demonstrates that combining pruning and quantization can drastically reduce RL neural network sizes—with up to 400x total compression—while preserving or slightly improving performance across diverse environments and algorithms. By detailing a gradual pruning schedule, quantization-aware training, and layer-specific quantization strategies, the authors show that both SAC and DQN can operate efficiently on resource-constrained hardware. The results suggest large redundancy in RL networks and motivate co-design of algorithms and hardware to enable edge AI and real-time control. Overall, the study provides a practical pathway to deploy RL in embedded settings without sacrificing efficacy.

Abstract

In real applications of Reinforcement Learning (RL), such as robotics, low latency and energy efficient inference is very desired. The use of sparsity and pruning for optimizing Neural Network inference, and particularly to improve energy and latency efficiency, is a standard technique. In this work, we perform a systematic investigation of applying these optimization techniques for different RL algorithms in different RL environments, yielding up to a 400-fold reduction in the size of neural networks.
Paper Structure (16 sections, 1 equation, 6 figures, 3 tables)

This paper contains 16 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Illustration of the fitting of dense NN to DRAM memory and sparse and quantized NN to SRAM memory.
  • Figure 2: The plot of sparsity function for gradual pruning. The x-axis denotes the pruning step number. The y-axis denotes the neural network sparsity degree.
  • Figure 3: General scheme of training. A randomly initialized neural network is trained for 20% of the total steps in a classical manner. Further, during the 20-80% of training, gradual pruning with n steps is applied. Then pruning is turned off and from 80 to 100% of steps the network is trained again in the classical way. If it is necessary to quantize a NN, additionally 20 % training steps (step 100-120%) with 8-bit quantization are performed.
  • Figure 4: Results for SAC algorithm applied to MuJoCo suite environments. The x-axes of the figures denote the neural network sparsity degree; the y-axes denote the performance -- the reward received by an agent. The blue line shows the performance of the pruned network, and the red line shows the performance of the pruned and quantized network. The dotted purple line shows the performance of the quantized-only network, the green dashed line shows the performance of the default network.
  • Figure 5: Results for DQN algorithm based on the CNN applied to Atari environments. The x-axes of the figures denote the neural network sparsity degree; the y-axes denote the performance -- the reward received by an agent. The blue line shows the performance of the pruned network and the red line shows the performance of the pruned and quantized network. The dotted purple line shows the performance of the quantized-only network, green dashed line shows the performance of the default dense and fully precision network.
  • ...and 1 more figures