Table of Contents
Fetching ...

Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness

Boqian Wu, Qiao Xiao, Shunxin Wang, Nicola Strisciuglio, Mykola Pechenizkiy, Maurice van Keulen, Decebal Constantin Mocanu, Elena Mocanu

TL;DR

This work questions the default use of dense training for robustness to image corruptions and demonstrates that Dynamic Sparse Training (DST) at low to moderate sparsity can surpass dense models in corruption robustness without extra resource costs. By validating the DSCR hypothesis across image and video datasets, architectures (including CNNs and transformers), and multiple DST algorithms (SET, RigL, MEST, GraNet), the authors reveal that DST acts as an implicit regularizer that biases learning toward low-frequency features. They provide spatial- and spectral-domain explanations for this robustness, showing reduced reliance on high-frequency content and focusing the model on more informative features. The findings suggest a practical shift toward DST as a robustness-boosting, resource-efficient training paradigm with broad applicability, including segmentation and real-world video tasks, and motivate further theoretical and methodological advances in robust sparse learning.

Abstract

It is generally perceived that Dynamic Sparse Training opens the door to a new era of scalability and efficiency for artificial neural networks at, perhaps, some costs in accuracy performance for the classification task. At the same time, Dense Training is widely accepted as being the "de facto" approach to train artificial neural networks if one would like to maximize their robustness against image corruption. In this paper, we question this general practice. Consequently, we claim that, contrary to what is commonly thought, the Dynamic Sparse Training methods can consistently outperform Dense Training in terms of robustness accuracy, particularly if the efficiency aspect is not considered as a main objective (i.e., sparsity levels between 10% and up to 50%), without adding (or even reducing) resource cost. We validate our claim on two types of data, images and videos, using several traditional and modern deep learning architectures for computer vision and three widely studied Dynamic Sparse Training algorithms. Our findings reveal a new yet-unknown benefit of Dynamic Sparse Training and open new possibilities in improving deep learning robustness beyond the current state of the art.

Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness

TL;DR

This work questions the default use of dense training for robustness to image corruptions and demonstrates that Dynamic Sparse Training (DST) at low to moderate sparsity can surpass dense models in corruption robustness without extra resource costs. By validating the DSCR hypothesis across image and video datasets, architectures (including CNNs and transformers), and multiple DST algorithms (SET, RigL, MEST, GraNet), the authors reveal that DST acts as an implicit regularizer that biases learning toward low-frequency features. They provide spatial- and spectral-domain explanations for this robustness, showing reduced reliance on high-frequency content and focusing the model on more informative features. The findings suggest a practical shift toward DST as a robustness-boosting, resource-efficient training paradigm with broad applicability, including segmentation and real-world video tasks, and motivate further theoretical and methodological advances in robust sparse learning.

Abstract

It is generally perceived that Dynamic Sparse Training opens the door to a new era of scalability and efficiency for artificial neural networks at, perhaps, some costs in accuracy performance for the classification task. At the same time, Dense Training is widely accepted as being the "de facto" approach to train artificial neural networks if one would like to maximize their robustness against image corruption. In this paper, we question this general practice. Consequently, we claim that, contrary to what is commonly thought, the Dynamic Sparse Training methods can consistently outperform Dense Training in terms of robustness accuracy, particularly if the efficiency aspect is not considered as a main objective (i.e., sparsity levels between 10% and up to 50%), without adding (or even reducing) resource cost. We validate our claim on two types of data, images and videos, using several traditional and modern deep learning architectures for computer vision and three widely studied Dynamic Sparse Training algorithms. Our findings reveal a new yet-unknown benefit of Dynamic Sparse Training and open new possibilities in improving deep learning robustness beyond the current state of the art.
Paper Structure (31 sections, 1 equation, 19 figures, 12 tables)

This paper contains 31 sections, 1 equation, 19 figures, 12 tables.

Figures (19)

  • Figure 1: (Left) Robustness accuracy gain (%) in a sparsified ResNet34 trained with SET (sparsity ratio=$0.5$) compared to its dense counterpart, tested on CIFAR100-C with various corruption types shown on the Y-axis. Positive values indicate better performances of SET method. (Right) conceptual representation of model density during training across different DST algorithms: (a) SET, (b) RigL (Note: In the regrow step, RigL requires full gradient calculations.), (c) MEST and (d) GraNet.
  • Figure 2: Robustness accuracy (%) for (a) VGG16 on CIFAR10-C, (b) ResNet34 on CIFAR100-C and (c) EfficientNet-B0 on TinyImageNet-C, comparing different DST algorithms with dense training. Left: Results for DST algorithms with random regrow strategy, including SET, MEST$_r$, and GraNet$_r$. Right: Results for DST algorithms with gradient-based regrow strategy, such as RigL, MEST$_g$, and GraNet$_g$.
  • Figure 3: Relative robustness accuracy gain (%) on ImageNet-C (top) and ImageNet-$\bar{\mathbf{C}}$ (bottom) for ResNet50, trained using DST with gradient-based strategies (i.e., (a) RigL, (b) MEST$_g$, (c) GraNet$_g$) at a sparsity ratio of 0.1, compared to a dense baseline (which has a mean robustness accuracy of $38.38\%$ on ImageNet-C and $40.38\%$ on ImageNet-$\bar{\mathbf{C}}$). Positive values reflect better performance by the sparse models compared to the dense baseline. The five bars, ranging from light to dark shades, represent corruption severity levels from 1 to 5 for each type of corruption. The title indicates the mean robustness accuracy for each corresponding DST method.
  • Figure 4: Relative robustness accuracy gain (%) on ImageNet-C for a DeiT-base, dynamically trained with gradient-based methods (i.e., (a) RigL, (b) MEST$_g$, (c) GraNet$_g$) at a sparsity ratio of $0.1$, compared to a dense baseline (with a robustness accuracy of $54.68\%$ on ImageNet-C). Positive values indicate superior performance by the sparse models.
  • Figure 5: Visualization of the non-zero weight count (1st row) and the sum of weight magnitudes (2nd row) within a $3\times3$ kernel from the #layer3.3.conv2 of ResNet34 after training on the CIFAR100 using dense training or different DST methods (with a sparsity ratio of $0.5$). Each value in the figure is derived from calculations within the kernel, with lighter colors indicating larger values.
  • ...and 14 more figures