Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance

Mackenzie J. Meni; Ryan T. White; Michael Mayo; Kevin Pilkiewicz

Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance

Mackenzie J. Meni, Ryan T. White, Michael Mayo, Kevin Pilkiewicz

TL;DR

This work tackles how to regulate information flow in neural networks by tracking entropy changes through layers. It derives tractable formulas for entropy propagation such as $H(WX) = H(X) + log(|det W|)$ and $H(C*X) = H(X) + (l-p+1)(w-q+1) log(|c11|)$, and builds entropy-based losses $L_{dense}$ and $L_{conv}$ to steer training via $L = L_{acc} + abla\lambda_1 L_{dense} + \lambda_2 L_{conv}$. Empirical results on MNIST and CIFAR-10 for autoencoding and classification, plus large-scale CNNs and U-Net segmentation, show faster convergence (up to ~4x) and improved accuracy when entropy guidance is applied in alignment with observed information-flow patterns in well-trained networks. The findings offer a practical, scalable approach to information-theoretic regularization that can guide architecture design, training efficiency, and interpretability in computer vision.

Abstract

Neural networks have dramatically increased our capacity to learn from large, high-dimensional datasets across innumerable disciplines. However, their decisions are not easily interpretable, their computational costs are high, and building and training them are not straightforward processes. To add structure to these efforts, we derive new mathematical results to efficiently measure the changes in entropy as fully-connected and convolutional neural networks process data. By measuring the change in entropy as networks process data effectively, patterns critical to a well-performing network can be visualized and identified. Entropy-based loss terms are developed to improve dense and convolutional model accuracy and efficiency by promoting the ideal entropy patterns. Experiments in image compression, image classification, and image segmentation on benchmark datasets demonstrate these losses guide neural networks to learn rich latent data representations in fewer dimensions, converge in fewer training epochs, and achieve higher accuracy.

Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance

TL;DR

This work tackles how to regulate information flow in neural networks by tracking entropy changes through layers. It derives tractable formulas for entropy propagation such as

and

, and builds entropy-based losses

and

to steer training via

. Empirical results on MNIST and CIFAR-10 for autoencoding and classification, plus large-scale CNNs and U-Net segmentation, show faster convergence (up to ~4x) and improved accuracy when entropy guidance is applied in alignment with observed information-flow patterns in well-trained networks. The findings offer a practical, scalable approach to information-theoretic regularization that can guide architecture design, training efficiency, and interpretability in computer vision.

Abstract

Paper Structure (16 sections, 3 theorems, 30 equations, 10 figures, 3 tables)

This paper contains 16 sections, 3 theorems, 30 equations, 10 figures, 3 tables.

Introduction
Related Work
Probabilistic Results
Dense Layers
2D Convolutions
Entropy-Based Guidance of Dense and Convolutional Neural Networks
Dense Entropy Loss
2D Convolutional Entropy Loss
Experiments
Experimental Modification of Entropy-based Losses
Dense Autoencoders for Image Compression
CNNs for Image Classification with Statistical Evaluation of Convolutional Entropy Loss
Large-scale CNNs for Classification with Convolutional Entropy Loss
U-Net for Image Segmentation
Conclusion
...and 1 more sections

Key Result

Theorem 2

(Cover2006, Corollary to Theorem 8.6.4) Let $X$ be a random variable valued in $\mathbb{R}^d$ and constant $W\in\mathbb{R}^{d\times d}$. If $W$ is invertible, then the entropy of $WX$ is

Figures (10)

Figure 1: Average change in entropy per filter in each layer of two VGG16 networks, one trained to classify ImageNet and one randomly initialized. The closed dots are means, box plots show first and third quartiles of entropy change per filter at each layer, and outliers are plotted as open dots.
Figure 2: This plot shows the comparison of $-\log(|x|)$ versus $\frac{1}{|x|}$.
Figure 3: MNIST Reconstruction MSE for Different Latent Dimensions and Dense Entropy Loss Coefficients $\lambda_1$
Figure 4: MNIST Stopping Epoch for Different Latent Dimensions and Entropy Loss Coefficients $\lambda_1$
Figure 5: CIFAR10 Reconstruction MSE for Different Latent Dimensions and Entropy Loss Coefficients $\lambda_1$
...and 5 more figures

Theorems & Definitions (7)

Definition 1
Theorem 2
Theorem 3
proof
Example 4
Example 5
Corollary 6

Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance

TL;DR

Abstract

Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (7)