Topology-Aware Revival for Efficient Sparse Training
Meiling Jin, Fei Wang, Xiaoyun Yuan, Chen Qian, Yuan Cheng
TL;DR
This work analyzes the brittleness of static sparse training under non-stationary data distributions in reinforcement learning. It introduces Topology-Aware Revival (TAR), a lightweight one-shot post-pruning procedure that allocates a small revival budget across layers using a topology proxy and a connectivity floor, then randomly revives a subset of previously pruned connections before fixing the connectivity for the remainder of training. TAR provides theoretical motivation via a coverage bound for random revival and demonstrates empirically that it yields up to +$37.9\%$ improvements over static baselines and a median +$13.5\%$ gain over dynamic sparse training on SAC/TD3 tasks, with scalable benefits when widening networks. The approach preserves the simplicity and low overhead of SST while mitigating structural bottlenecks arising from distribution drift, making static sparse training more robust and practical for non-stationary RL scenarios.
Abstract
Static sparse training is a promising route to efficient learning by committing to a fixed mask pattern, yet the constrained structure reduces robustness. Early pruning decisions can lock the network into a brittle structure that is difficult to escape, especially in deep reinforcement learning (RL) where the evolving policy continually shifts the training distribution. We propose Topology-Aware Revival (TAR), a lightweight one-shot post-pruning procedure that improves static sparsity without dynamic rewiring. After static pruning, TAR performs a single revival step by allocating a small reserve budget across layers according to topology needs, randomly uniformly reactivating a few previously pruned connections within each layer, and then keeping the resulting connectivity fixed for the remainder of training. Across multiple continuous-control tasks with SAC and TD3, TAR improves final return over static sparse baselines by up to +37.9% and also outperforms dynamic sparse training baselines with a median gain of +13.5%.
