The Impact of Quantization and Pruning on Deep Reinforcement Learning Models

Heng Lu; Mehdi Alemi; Reza Rawassizadeh

The Impact of Quantization and Pruning on Deep Reinforcement Learning Models

Heng Lu, Mehdi Alemi, Reza Rawassizadeh

TL;DR

This work addresses the resource challenges of deep reinforcement learning by systematically evaluating two compression strategies—quantization and pruning—across five model-free DRL algorithms (TRPO, PPO, DDPG, TD3, SAC) in MuJoCo environments. It compares quantization variants (PTDQ, PTSQ, QAT) and DepGraph-enabled $L_1$/$L_2$ pruning, measuring average return, memory, inference time, and energy usage. Key findings show that while compression reduces model size, energy efficiency and memory benefits are not consistently realized, with PTDQ often outperforming others and PTSQ underperforming due to distribution shifts; pruning (especially $L_2$) can reduce size with limited impact, but does not guarantee speedups or energy gains. The results provide practical guidelines for deploying compact DRL models on resource-constrained hardware and highlight the trade-offs between model compression and RL performance across varied environments and algorithms.

Abstract

Deep reinforcement learning (DRL) has achieved remarkable success across various domains, such as video games, robotics, and, recently, large language models. However, the computational costs and memory requirements of DRL models often limit their deployment in resource-constrained environments. The challenge underscores the urgent need to explore neural network compression methods to make RDL models more practical and broadly applicable. Our study investigates the impact of two prominent compression methods, quantization and pruning on DRL models. We examine how these techniques influence four performance factors: average return, memory, inference time, and battery utilization across various DRL algorithms and environments. Despite the decrease in model size, we identify that these compression techniques generally do not improve the energy efficiency of DRL models, but the model size decreases. We provide insights into the trade-offs between model compression and DRL performance, offering guidelines for deploying efficient DRL models in resource-constrained settings.

The Impact of Quantization and Pruning on Deep Reinforcement Learning Models

TL;DR

pruning, measuring average return, memory, inference time, and energy usage. Key findings show that while compression reduces model size, energy efficiency and memory benefits are not consistently realized, with PTDQ often outperforming others and PTSQ underperforming due to distribution shifts; pruning (especially

) can reduce size with limited impact, but does not guarantee speedups or energy gains. The results provide practical guidelines for deploying compact DRL models on resource-constrained hardware and highlight the trade-offs between model compression and RL performance across varied environments and algorithms.

Abstract

Paper Structure (12 sections, 3 figures, 3 tables)

This paper contains 12 sections, 3 figures, 3 tables.

Introduction and Background
Methods
Quantization
Pruning
Experiments
Experimental Settings
Quantization
Average Return
Resource Utilization
Pruning
Discussions and Findings
Conclusion

Figures (3)

Figure 1: Inference time (in seconds), energy usage (in Joules) and memory utilization (in MegaByte) of quantization models.
Figure 2: Inference time, energy usage and RAM of $L_1$ models, scaled by baseline models
Figure 3: Inference time (in seconds), energy usage (in Joules) and memory utilization (in Megabytes) of $L_2$ models, scaled by baseline models.

The Impact of Quantization and Pruning on Deep Reinforcement Learning Models

TL;DR

Abstract

The Impact of Quantization and Pruning on Deep Reinforcement Learning Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)