Table of Contents
Fetching ...

Exploring The Neural Burden In Pruned Models: An Insight Inspired By Neuroscience

Zeyu Wang, Weichen Dai, Xiangyu Zhou, Ji Qi, Yi Zhou

TL;DR

The paper investigates why pruning during training degrades performance in Vision Transformers and introduces Neural Burden, a neuroscience‑inspired view that artificial neurons become overloaded as they compensate for pruned information. It proposes a simple, plug‑and‑play mitigation framework combining Single‑Pass Token Merging for data compression and a Persistent Dynamic Weight Pruning strategy to reduce neuron load and allow re‑activation. The authors formalize the burden with cost functions that resemble reconstruction and sparsity tradeoffs, and validate the approach on MNIST and CIFAR‑10 using a four‑layer ViT, showing neural burden exists and data compression can recover much of the lost performance under high sparsity. The work highlights the potential of integrating neuroscience insights with adaptive, compression‑aware pruning to improve efficiency without sacrificing accuracy in sparse Transformer models.

Abstract

Vision Transformer and its variants have been adopted in many visual tasks due to their powerful capabilities, which also bring significant challenges in computation and storage. Consequently, researchers have introduced various compression methods in recent years, among which the pruning techniques are widely used to remove a significant fraction of the network. Therefore, these methods can reduce significant percent of the FLOPs, but often lead to a decrease in model performance. To investigate the underlying causes, we focus on the pruning methods specifically belonging to the pruning-during-training category, then drew inspiration from neuroscience and propose a new concept for artificial neural network models named Neural Burden. We investigate its impact in the model pruning process, and subsequently explore a simple yet effective approach to mitigate the decline in model performance, which can be applied to any pruning-during-training technique. Extensive experiments indicate that the neural burden phenomenon indeed exists, and show the potential of our method. We hope that our findings can provide valuable insights for future research. Code will be made publicly available after this paper is published.

Exploring The Neural Burden In Pruned Models: An Insight Inspired By Neuroscience

TL;DR

The paper investigates why pruning during training degrades performance in Vision Transformers and introduces Neural Burden, a neuroscience‑inspired view that artificial neurons become overloaded as they compensate for pruned information. It proposes a simple, plug‑and‑play mitigation framework combining Single‑Pass Token Merging for data compression and a Persistent Dynamic Weight Pruning strategy to reduce neuron load and allow re‑activation. The authors formalize the burden with cost functions that resemble reconstruction and sparsity tradeoffs, and validate the approach on MNIST and CIFAR‑10 using a four‑layer ViT, showing neural burden exists and data compression can recover much of the lost performance under high sparsity. The work highlights the potential of integrating neuroscience insights with adaptive, compression‑aware pruning to improve efficiency without sacrificing accuracy in sparse Transformer models.

Abstract

Vision Transformer and its variants have been adopted in many visual tasks due to their powerful capabilities, which also bring significant challenges in computation and storage. Consequently, researchers have introduced various compression methods in recent years, among which the pruning techniques are widely used to remove a significant fraction of the network. Therefore, these methods can reduce significant percent of the FLOPs, but often lead to a decrease in model performance. To investigate the underlying causes, we focus on the pruning methods specifically belonging to the pruning-during-training category, then drew inspiration from neuroscience and propose a new concept for artificial neural network models named Neural Burden. We investigate its impact in the model pruning process, and subsequently explore a simple yet effective approach to mitigate the decline in model performance, which can be applied to any pruning-during-training technique. Extensive experiments indicate that the neural burden phenomenon indeed exists, and show the potential of our method. We hope that our findings can provide valuable insights for future research. Code will be made publicly available after this paper is published.
Paper Structure (19 sections, 13 equations, 2 figures, 2 tables)

This paper contains 19 sections, 13 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of our proposed joint compression framework. We use a bipartite matching method to perform token merging on the data before sent to the transformer encoder, thereby compressing the data. Simultaneously, a dynamic pruning method is employed throughout the training process. To validate the effectiveness of the framework, both algorithms are implemented using relatively simple approaches commonly used in their respective fields.
  • Figure 2: For each subgraph, the first row shows the cumulative mean of gradients/weights for neurons recorded in each epoch, and the second row shows the mean of gradients/weights for neurons recorded on every 100 steps. The blue curve represents the dynamic iterative pruning experiment, the yellow curve represents the experiment of training from scratch using the optimal subnetwork, and the green curve represents the experiment of training from scratch using a random subnetwork.