Information Plane Analysis Visualization in Deep Learning via Transfer Entropy
Adrian Moldovan, Angel Cataron, Razvan Andonie
TL;DR
This paper addresses how information flows and compresses in deep networks during training and whether compression relates to generalization. It introduces Transfer Entropy ($TE$) to quantify directional, layer-to-layer information transfer and integrates it with Information Plane (IP) analysis by binarizing activations to compute $TE$ between adjacent layers. The main contributions include the first application of $TE$ to investigate the Information Bottleneck (IB) principle in neural networks, demonstrated on shallow and CNN architectures where $TE$ concentrates in final layers, decreases during training, and correlates with accuracy and loss, suggesting TE as a layer-wise proxy for compression and a diagnostic for learning dynamics. The approach offers temporally aware insights into learning dynamics and points to potential TE-guided training strategies or regularization to improve efficiency and generalization in deep networks.
Abstract
In a feedforward network, Transfer Entropy (TE) can be used to measure the influence that one layer has on another by quantifying the information transfer between them during training. According to the Information Bottleneck principle, a neural model's internal representation should compress the input data as much as possible while still retaining sufficient information about the output. Information Plane analysis is a visualization technique used to understand the trade-off between compression and information preservation in the context of the Information Bottleneck method by plotting the amount of information in the input data against the compressed representation. The claim that there is a causal link between information-theoretic compression and generalization, measured by mutual information, is plausible, but results from different studies are conflicting. In contrast to mutual information, TE can capture temporal relationships between variables. To explore such links, in our novel approach we use TE to quantify information transfer between neural layers and perform Information Plane analysis. We obtained encouraging experimental results, opening the possibility for further investigations.
