Table of Contents
Fetching ...

Dynamic DropConnect: Enhancing Neural Network Robustness through Adaptive Edge Dropping Strategies

Yuan-Chih Yang, Hung-Hsuan Chen

TL;DR

DynamicDropConnect (DDC) tackles overfitting by replacing fixed-edge dropping with per-edge drop probabilities derived from gradient magnitudes. The method generates a gradient-informed mask using layer-wise normalization and a candidate drop probability $q_{i,j}^{(l)}$, combining it with a base rate $p$ and gradient-rate $p_g$ to produce $p_{i,j}^{(l)}$, then calibrates training outputs to ensure inference uses the original weights. Across synthetic data and multiple open datasets (MNIST, CIFAR-10/100, NORB) and architectures (SimpleCNN, AlexNet, VGG), DDC consistently outperforms Dropout, DropConnect, and Standout, with higher accuracy and lower variance. This parameter-free approach provides a robust, scalable regularization mechanism and suggests promising theoretical avenues linking gradient-driven dropping to Bayesian perspectives. The work includes public code to enable replication and further exploration of gradient-based edge dropping.

Abstract

Dropout and DropConnect are well-known techniques that apply a consistent drop rate to randomly deactivate neurons or edges in a neural network layer during training. This paper introduces a novel methodology that assigns dynamic drop rates to each edge within a layer, uniquely tailoring the dropping process without incorporating additional learning parameters. We perform experiments on synthetic and openly available datasets to validate the effectiveness of our approach. The results demonstrate that our method outperforms Dropout, DropConnect, and Standout, a classic mechanism known for its adaptive dropout capabilities. Furthermore, our approach improves the robustness and generalization of neural network training without increasing computational complexity. The complete implementation of our methodology is publicly accessible for research and replication purposes at https://github.com/ericabd888/Adjusting-the-drop-probability-in-DropConnect-based-on-the-magnitude-of-the-gradient/.

Dynamic DropConnect: Enhancing Neural Network Robustness through Adaptive Edge Dropping Strategies

TL;DR

DynamicDropConnect (DDC) tackles overfitting by replacing fixed-edge dropping with per-edge drop probabilities derived from gradient magnitudes. The method generates a gradient-informed mask using layer-wise normalization and a candidate drop probability , combining it with a base rate and gradient-rate to produce , then calibrates training outputs to ensure inference uses the original weights. Across synthetic data and multiple open datasets (MNIST, CIFAR-10/100, NORB) and architectures (SimpleCNN, AlexNet, VGG), DDC consistently outperforms Dropout, DropConnect, and Standout, with higher accuracy and lower variance. This parameter-free approach provides a robust, scalable regularization mechanism and suggests promising theoretical avenues linking gradient-driven dropping to Bayesian perspectives. The work includes public code to enable replication and further exploration of gradient-based edge dropping.

Abstract

Dropout and DropConnect are well-known techniques that apply a consistent drop rate to randomly deactivate neurons or edges in a neural network layer during training. This paper introduces a novel methodology that assigns dynamic drop rates to each edge within a layer, uniquely tailoring the dropping process without incorporating additional learning parameters. We perform experiments on synthetic and openly available datasets to validate the effectiveness of our approach. The results demonstrate that our method outperforms Dropout, DropConnect, and Standout, a classic mechanism known for its adaptive dropout capabilities. Furthermore, our approach improves the robustness and generalization of neural network training without increasing computational complexity. The complete implementation of our methodology is publicly accessible for research and replication purposes at https://github.com/ericabd888/Adjusting-the-drop-probability-in-DropConnect-based-on-the-magnitude-of-the-gradient/.

Paper Structure

This paper contains 10 sections, 6 equations, 2 figures, 4 tables, 2 algorithms.

Figures (2)

  • Figure 1: The contour plot of the loss and the parameter update process. The two rows represent two sets of initial values and their updating process. The blue pluses denote the initial values. The red stars, triangles, and diamonds represent the values of $w_1$ and $w_2$ after training for 7, 13, and 19 epochs. The method that tends to drop the edges with small gradients (the second column) reaches a small error faster.
  • Figure 2: Loss vs. epochs of different dropping strategies.