Table of Contents
Fetching ...

Synaptic Pruning: A Biological Inspiration for Deep Learning Regularization

Gideon Vos, Liza van Eijk, Zoltan Sarnyai, Mostafa Rahimi Azghadi

TL;DR

This paper tackles the inefficiency and static nature of conventional dropout by proposing a biology-inspired, magnitude-based synaptic pruning method that progressively eliminates low-importance connections during training. The approach integrates permanent pruning masks into the training loop with a cubic sparsity schedule and global weight ranking, enabling dynamic adaptation across RNN, LSTM, and PatchTST architectures for time-series forecasting. Across four diverse datasets, the method yields consistent MAE improvements, with significant gains in several configurations (up to 52% in some transformers) and modest overhead, demonstrating its practicality as a regularization and compression technique. The work highlights the potential of activity-dependent pruning to enhance generalization and efficiency, particularly in financial time-series applications, and points to future work in scalability and broader architecture validation.

Abstract

Synaptic pruning in biological brains removes weak connections to improve efficiency. In contrast, dropout regularization in artificial neural networks randomly deactivates neurons without considering activity-dependent pruning. We propose a magnitude-based synaptic pruning method that better reflects biology by progressively removing low-importance connections during training. Integrated directly into the training loop as a dropout replacement, our approach computes weight importance from absolute magnitudes across layers and applies a cubic schedule to gradually increase global sparsity. At fixed intervals, pruning masks permanently remove low-importance weights while maintaining gradient flow for active ones, eliminating the need for separate pruning and fine-tuning phases. Experiments on multiple time series forecasting models including RNN, LSTM, and Patch Time Series Transformer across four datasets show consistent gains. Our method ranked best overall, with statistically significant improvements confirmed by Friedman tests (p < 0.01). In financial forecasting, it reduced Mean Absolute Error by up to 20% over models with no or standard dropout, and up to 52% in select transformer models. This dynamic pruning mechanism advances regularization by coupling weight elimination with progressive sparsification, offering easy integration into diverse architectures. Its strong performance, especially in financial time series forecasting, highlights its potential as a practical alternative to conventional dropout techniques.

Synaptic Pruning: A Biological Inspiration for Deep Learning Regularization

TL;DR

This paper tackles the inefficiency and static nature of conventional dropout by proposing a biology-inspired, magnitude-based synaptic pruning method that progressively eliminates low-importance connections during training. The approach integrates permanent pruning masks into the training loop with a cubic sparsity schedule and global weight ranking, enabling dynamic adaptation across RNN, LSTM, and PatchTST architectures for time-series forecasting. Across four diverse datasets, the method yields consistent MAE improvements, with significant gains in several configurations (up to 52% in some transformers) and modest overhead, demonstrating its practicality as a regularization and compression technique. The work highlights the potential of activity-dependent pruning to enhance generalization and efficiency, particularly in financial time-series applications, and points to future work in scalability and broader architecture validation.

Abstract

Synaptic pruning in biological brains removes weak connections to improve efficiency. In contrast, dropout regularization in artificial neural networks randomly deactivates neurons without considering activity-dependent pruning. We propose a magnitude-based synaptic pruning method that better reflects biology by progressively removing low-importance connections during training. Integrated directly into the training loop as a dropout replacement, our approach computes weight importance from absolute magnitudes across layers and applies a cubic schedule to gradually increase global sparsity. At fixed intervals, pruning masks permanently remove low-importance weights while maintaining gradient flow for active ones, eliminating the need for separate pruning and fine-tuning phases. Experiments on multiple time series forecasting models including RNN, LSTM, and Patch Time Series Transformer across four datasets show consistent gains. Our method ranked best overall, with statistically significant improvements confirmed by Friedman tests (p < 0.01). In financial forecasting, it reduced Mean Absolute Error by up to 20% over models with no or standard dropout, and up to 52% in select transformer models. This dynamic pruning mechanism advances regularization by coupling weight elimination with progressive sparsification, offering easy integration into diverse architectures. Its strong performance, especially in financial time series forecasting, highlights its potential as a practical alternative to conventional dropout techniques.

Paper Structure

This paper contains 23 sections, 1 equation, 6 figures, 4 tables, 6 algorithms.

Figures (6)

  • Figure 1: Comparison of standard dropout and our synaptic pruning method implemented on LSTM model architectures. Dropout temporarily deactivates neurons during training, whereas pruning permanently removes specific connections, resulting in lasting sparsity and improved efficiency.
  • Figure 2: Comparison of standard dropout and our synaptic pruning method implemented on PatchTST model architectures. Dropout temporarily deactivates neurons during training, whereas pruning permanently removes specific connections, resulting in lasting sparsity and improved efficiency.
  • Figure 3: Error rate comparison of regularization methods implemented on RNN model architectures.
  • Figure 4: Error rate comparison of regularization methods implemented on LSTM model architectures.
  • Figure 5: Error rate comparison of regularization methods implemented on PatchTST model architectures.
  • ...and 1 more figures