Table of Contents
Fetching ...

Y-Drop: A Conductance based Dropout for fully connected layers

Efthymios Georgiou, Georgios Paraskevopoulos, Alexandros Potamianos

TL;DR

This work introduces Y-Drop, a regularization method that biases the dropout algorithm towards dropping more important neurons with higher probability, an interpretable measure of neuron importance that calculates the contribution of each neuron towards the end-to-end mapping of the network.

Abstract

In this work, we introduce Y-Drop, a regularization method that biases the dropout algorithm towards dropping more important neurons with higher probability. The backbone of our approach is neuron conductance, an interpretable measure of neuron importance that calculates the contribution of each neuron towards the end-to-end mapping of the network. We investigate the impact of the uniform dropout selection criterion on performance by assigning higher dropout probability to the more important units. We show that forcing the network to solve the task at hand in the absence of its important units yields a strong regularization effect. Further analysis indicates that Y-Drop yields solutions where more neurons are important, i.e have high conductance, and yields robust networks. In our experiments we show that the regularization effect of Y-Drop scales better than vanilla dropout w.r.t. the architecture size and consistently yields superior performance over multiple datasets and architecture combinations, with little tuning.

Y-Drop: A Conductance based Dropout for fully connected layers

TL;DR

This work introduces Y-Drop, a regularization method that biases the dropout algorithm towards dropping more important neurons with higher probability, an interpretable measure of neuron importance that calculates the contribution of each neuron towards the end-to-end mapping of the network.

Abstract

In this work, we introduce Y-Drop, a regularization method that biases the dropout algorithm towards dropping more important neurons with higher probability. The backbone of our approach is neuron conductance, an interpretable measure of neuron importance that calculates the contribution of each neuron towards the end-to-end mapping of the network. We investigate the impact of the uniform dropout selection criterion on performance by assigning higher dropout probability to the more important units. We show that forcing the network to solve the task at hand in the absence of its important units yields a strong regularization effect. Further analysis indicates that Y-Drop yields solutions where more neurons are important, i.e have high conductance, and yields robust networks. In our experiments we show that the regularization effect of Y-Drop scales better than vanilla dropout w.r.t. the architecture size and consistently yields superior performance over multiple datasets and architecture combinations, with little tuning.
Paper Structure (17 sections, 9 equations, 3 figures, 3 tables)

This paper contains 17 sections, 9 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Y-Drop consists of two phases during each training step, conductance calculation and network update. To calculate conductance, we first interpolate samples over a given sample and an uninformative sample (e.g. a black image) and feed them to the network. For every unit in each layer, conductance is calculated based on the unit's activations (green forward pass) and the unit's partial derivatives (green backward pass) for all interpolated samples. Darker colors denote units with higher per-layer conductance. During the second phase, we use conductance scores for each unit to determine the unit's drop probability and the network parameters are updated through backpropagation. The curved arrows denote the transitions between phases.
  • Figure 2: Illustration of the average neuron conductance scores for $1024$ units in a single layered network trained on MNIST, using Y-Drop (green), Dropout (orange) and no regularization / Plain (purple). Fig. \ref{['fig:mean-cond']} shows the mean neuron conductance of the three models. Fig. \ref{['fig:cumsum-cond']} shows the cumulative sum of conductance over units. Colored numbers in Fig. \ref{['fig:cumsum-cond']} indicate the percentage of the total layer conductance when the top $200, 400, 600$ or $800$ units are taken into consideration. Units in both Figures are sorted from highest conductance score to lowest.
  • Figure 3: Performance of Y-Drop (green), Dropout (orange) an no regularization / Plain (purple) trained networks, when we drop progressively higher percentage of units (during inference) . In Fig. \ref{['fig:random_mask']} we drop units randomly. In Fig. \ref{['fig:cond_mask']} we first drop units with higher conductance scores. Both figures end when $99\%$ of units are dropped.