Table of Contents
Fetching ...

Learning in Convolutional Neural Networks Accelerated by Transfer Entropy

Adrian Moldovan, Angel Caţaron, Răzvan Andonie

TL;DR

This paper investigates using Transfer Entropy (TE) to quantify directional information transfer between neural layers and to inject TE-based feedback into CNN training. TE is computed for a restricted set of inter-neural pairs from the last two fully connected layers, serving as a momentum-like factor that modulates weight updates while maintaining computational feasibility via a sliding window estimation. The approach aims to accelerate training, improve stability, and enhance interpretability by linking learning dynamics to information transfer, with practical trade-offs demonstrated on multiple benchmark datasets. The study concludes that TE-driven updates can deliver favorable accuracy and convergence benefits when paired with a constrained TE computation strategy, suggesting a path toward information-guided learning in deep networks.

Abstract

Recently, there is a growing interest in applying Transfer Entropy (TE) in quantifying the effective connectivity between artificial neurons. In a feedforward network, the TE can be used to quantify the relationships between neuron output pairs located in different layers. Our focus is on how to include the TE in the learning mechanisms of a Convolutional Neural Network (CNN) architecture. We introduce a novel training mechanism for CNN architectures which integrates the TE feedback connections. Adding the TE feedback parameter accelerates the training process, as fewer epochs are needed. On the flip side, it adds computational overhead to each epoch. According to our experiments on CNN classifiers, to achieve a reasonable computational overhead--accuracy trade-off, it is efficient to consider only the inter-neural information transfer of a random subset of the neuron pairs from the last two fully connected layers. The TE acts as a smoothing factor, generating stability and becoming active only periodically, not after processing each input sample. Therefore, we can consider the TE is in our model a slowly changing meta-parameter.

Learning in Convolutional Neural Networks Accelerated by Transfer Entropy

TL;DR

This paper investigates using Transfer Entropy (TE) to quantify directional information transfer between neural layers and to inject TE-based feedback into CNN training. TE is computed for a restricted set of inter-neural pairs from the last two fully connected layers, serving as a momentum-like factor that modulates weight updates while maintaining computational feasibility via a sliding window estimation. The approach aims to accelerate training, improve stability, and enhance interpretability by linking learning dynamics to information transfer, with practical trade-offs demonstrated on multiple benchmark datasets. The study concludes that TE-driven updates can deliver favorable accuracy and convergence benefits when paired with a constrained TE computation strategy, suggesting a path toward information-guided learning in deep networks.

Abstract

Recently, there is a growing interest in applying Transfer Entropy (TE) in quantifying the effective connectivity between artificial neurons. In a feedforward network, the TE can be used to quantify the relationships between neuron output pairs located in different layers. Our focus is on how to include the TE in the learning mechanisms of a Convolutional Neural Network (CNN) architecture. We introduce a novel training mechanism for CNN architectures which integrates the TE feedback connections. Adding the TE feedback parameter accelerates the training process, as fewer epochs are needed. On the flip side, it adds computational overhead to each epoch. According to our experiments on CNN classifiers, to achieve a reasonable computational overhead--accuracy trade-off, it is efficient to consider only the inter-neural information transfer of a random subset of the neuron pairs from the last two fully connected layers. The TE acts as a smoothing factor, generating stability and becoming active only periodically, not after processing each input sample. Therefore, we can consider the TE is in our model a slowly changing meta-parameter.
Paper Structure (13 sections, 1 equation, 4 figures, 6 tables, 1 algorithm)

This paper contains 13 sections, 1 equation, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of the feedforward phase for the USPS dataset. The green arrows indicate the layers outputs that are used to compute the TE (Plotted using https://github.com/HarisIqbal88/PlotNeuralNet).
  • Figure 2: During the feedforward step, we compute time series $I$ and $J$, and the $\bf{te}$ matrix, as shown by the green arrows. When the backward step propagates the errors, we then use the $\bf{te}$ matrix in the weight updates as shown in the Algorithm \ref{['alg:tebackprop']}.
  • Figure 3: Evolution of the $\bf{te}$ standard deviation values on the first 4 epochs for the SVHN+TE dataset, for the pre-softmax layer. Each data point in the plot represents a batch. The rest of the TE values have a similar shape and decrease slowly during training. We observe the spikes of the TE values at the beginning of each epoch due to the training set randomization. During the first epoch the TE values are not calculated for the first batches in order to prevent anomalous values, thus its value is close to 0.
  • Figure 4: Illustration of how time series $I$ and $J$ are produced for a pair of neurons from layers $k$ and $l$, for multiple windows of events $u_1$ … $u_q$.