RNM-TD3: N:M Semi-structured Sparse Reinforcement Learning From Scratch

Isam Vrce; Andreas Kassler; Gökçe Aydos

RNM-TD3: N:M Semi-structured Sparse Reinforcement Learning From Scratch

Isam Vrce, Andreas Kassler, Gökçe Aydos

TL;DR

RNM-TD3 tackles the inefficiency of unstructured sparsity in deep reinforcement learning by enforcing row-wise $N{:}M$ sparsity throughout TD3 training, enabling hardware-accelerated sparse matmul on accelerators that support $N{:}M$ patterns. The method uses a projection operator $\mathcal{P}_{N{:}M}$ and masks $E^{()}$ to keep all six TD3 networks sparse while updating gradients only for active weights; target networks follow Polyak averaging for stability. The study introduces Sparse Architecture Divergence (SAD) to analyze mask evolution and finds that DRL may benefit from less frequent mask updates and a stable SAD correlates with better cumulative reward. Experiments on MuJoCo continuous-control tasks show RNM-TD3 matching or surpassing dense baselines at 50–75% sparsity (e.g., 2:4, 1:4) and remaining competitive up to 87.5% sparsity (1:8), with potential training speedups on hardware supporting $N{:}M$ sparse computation.

Abstract

Sparsity is a well-studied technique for compressing deep neural networks (DNNs) without compromising performance. In deep reinforcement learning (DRL), neural networks with up to 5% of their original weights can still be trained with minimal performance loss compared to their dense counterparts. However, most existing methods rely on unstructured fine-grained sparsity, which limits hardware acceleration opportunities due to irregular computation patterns. Structured coarse-grained sparsity enables hardware acceleration, yet typically degrades performance and increases pruning complexity. In this work, we present, to the best of our knowledge, the first study on N:M structured sparsity in RL, which balances compression, performance, and hardware efficiency. Our framework enforces row-wise N:M sparsity throughout training for all networks in off-policy RL (TD3), maintaining compatibility with accelerators that support N:M sparse matrix operations. Experiments on continuous-control benchmarks show that RNM-TD3, our N:M sparse agent, outperforms its dense counterpart at 50%-75% sparsity (e.g., 2:4 and 1:4), achieving up to a 14% increase in performance at 2:4 sparsity on the Ant environment. RNM-TD3 remains competitive even at 87.5% sparsity (1:8), while enabling potential training speedups.

RNM-TD3: N:M Semi-structured Sparse Reinforcement Learning From Scratch

TL;DR

RNM-TD3 tackles the inefficiency of unstructured sparsity in deep reinforcement learning by enforcing row-wise

sparsity throughout TD3 training, enabling hardware-accelerated sparse matmul on accelerators that support

patterns. The method uses a projection operator

and masks

to keep all six TD3 networks sparse while updating gradients only for active weights; target networks follow Polyak averaging for stability. The study introduces Sparse Architecture Divergence (SAD) to analyze mask evolution and finds that DRL may benefit from less frequent mask updates and a stable SAD correlates with better cumulative reward. Experiments on MuJoCo continuous-control tasks show RNM-TD3 matching or surpassing dense baselines at 50–75% sparsity (e.g., 2:4, 1:4) and remaining competitive up to 87.5% sparsity (1:8), with potential training speedups on hardware supporting

sparse computation.

Abstract

Paper Structure (6 sections, 6 equations, 2 figures)

This paper contains 6 sections, 6 equations, 2 figures.

INTRODUCTION
RELATED WORK
METHODOLOGY
Problem Formulation & Notation
Training under $N{:}M$ Sparsity
Differences Between $N{:}M$ Sparse Training in DRL and Supervised Learning

Figures (2)

Figure 1: Illustration of a single cycle of the RNM-TD3 algorithm, demonstrated with an $N:M$ sparsity pattern where $N=2$ and $M=4$. The weight matrices depicted have dimensions $R \times C$ (rows $\times$ columns). (Left) A dense matrix $W$ and an $N:M$ sparse mask $E$ are combined to create the sparse matrix $\tilde{W}$. This sparse matrix is trained for $K$ steps, during which only the active weights update the underlying dense matrix. (Right) After $K$ training steps, the weights in the original matrix $W$ change in magnitude (only at the positions of active weights), resulting in a different mask $E$. The updated matrix $W$ and the new mask $E$ create a new sparse matrix $\tilde{W}$. This cycle repeats until the end of training.
Figure :

RNM-TD3: N:M Semi-structured Sparse Reinforcement Learning From Scratch

TL;DR

Abstract

RNM-TD3: N:M Semi-structured Sparse Reinforcement Learning From Scratch

Authors

TL;DR

Abstract

Table of Contents

Figures (2)