Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness
Boqian Wu, Qiao Xiao, Shunxin Wang, Nicola Strisciuglio, Mykola Pechenizkiy, Maurice van Keulen, Decebal Constantin Mocanu, Elena Mocanu
TL;DR
This work questions the default use of dense training for robustness to image corruptions and demonstrates that Dynamic Sparse Training (DST) at low to moderate sparsity can surpass dense models in corruption robustness without extra resource costs. By validating the DSCR hypothesis across image and video datasets, architectures (including CNNs and transformers), and multiple DST algorithms (SET, RigL, MEST, GraNet), the authors reveal that DST acts as an implicit regularizer that biases learning toward low-frequency features. They provide spatial- and spectral-domain explanations for this robustness, showing reduced reliance on high-frequency content and focusing the model on more informative features. The findings suggest a practical shift toward DST as a robustness-boosting, resource-efficient training paradigm with broad applicability, including segmentation and real-world video tasks, and motivate further theoretical and methodological advances in robust sparse learning.
Abstract
It is generally perceived that Dynamic Sparse Training opens the door to a new era of scalability and efficiency for artificial neural networks at, perhaps, some costs in accuracy performance for the classification task. At the same time, Dense Training is widely accepted as being the "de facto" approach to train artificial neural networks if one would like to maximize their robustness against image corruption. In this paper, we question this general practice. Consequently, we claim that, contrary to what is commonly thought, the Dynamic Sparse Training methods can consistently outperform Dense Training in terms of robustness accuracy, particularly if the efficiency aspect is not considered as a main objective (i.e., sparsity levels between 10% and up to 50%), without adding (or even reducing) resource cost. We validate our claim on two types of data, images and videos, using several traditional and modern deep learning architectures for computer vision and three widely studied Dynamic Sparse Training algorithms. Our findings reveal a new yet-unknown benefit of Dynamic Sparse Training and open new possibilities in improving deep learning robustness beyond the current state of the art.
