Table of Contents
Fetching ...

Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance

Mackenzie J. Meni, Ryan T. White, Michael Mayo, Kevin Pilkiewicz

TL;DR

This work tackles how to regulate information flow in neural networks by tracking entropy changes through layers. It derives tractable formulas for entropy propagation such as $H(WX) = H(X) + log(|det W|)$ and $H(C*X) = H(X) + (l-p+1)(w-q+1) log(|c11|)$, and builds entropy-based losses $L_{dense}$ and $L_{conv}$ to steer training via $L = L_{acc} + abla\lambda_1 L_{dense} + \lambda_2 L_{conv}$. Empirical results on MNIST and CIFAR-10 for autoencoding and classification, plus large-scale CNNs and U-Net segmentation, show faster convergence (up to ~4x) and improved accuracy when entropy guidance is applied in alignment with observed information-flow patterns in well-trained networks. The findings offer a practical, scalable approach to information-theoretic regularization that can guide architecture design, training efficiency, and interpretability in computer vision.

Abstract

Neural networks have dramatically increased our capacity to learn from large, high-dimensional datasets across innumerable disciplines. However, their decisions are not easily interpretable, their computational costs are high, and building and training them are not straightforward processes. To add structure to these efforts, we derive new mathematical results to efficiently measure the changes in entropy as fully-connected and convolutional neural networks process data. By measuring the change in entropy as networks process data effectively, patterns critical to a well-performing network can be visualized and identified. Entropy-based loss terms are developed to improve dense and convolutional model accuracy and efficiency by promoting the ideal entropy patterns. Experiments in image compression, image classification, and image segmentation on benchmark datasets demonstrate these losses guide neural networks to learn rich latent data representations in fewer dimensions, converge in fewer training epochs, and achieve higher accuracy.

Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance

TL;DR

This work tackles how to regulate information flow in neural networks by tracking entropy changes through layers. It derives tractable formulas for entropy propagation such as and , and builds entropy-based losses and to steer training via . Empirical results on MNIST and CIFAR-10 for autoencoding and classification, plus large-scale CNNs and U-Net segmentation, show faster convergence (up to ~4x) and improved accuracy when entropy guidance is applied in alignment with observed information-flow patterns in well-trained networks. The findings offer a practical, scalable approach to information-theoretic regularization that can guide architecture design, training efficiency, and interpretability in computer vision.

Abstract

Neural networks have dramatically increased our capacity to learn from large, high-dimensional datasets across innumerable disciplines. However, their decisions are not easily interpretable, their computational costs are high, and building and training them are not straightforward processes. To add structure to these efforts, we derive new mathematical results to efficiently measure the changes in entropy as fully-connected and convolutional neural networks process data. By measuring the change in entropy as networks process data effectively, patterns critical to a well-performing network can be visualized and identified. Entropy-based loss terms are developed to improve dense and convolutional model accuracy and efficiency by promoting the ideal entropy patterns. Experiments in image compression, image classification, and image segmentation on benchmark datasets demonstrate these losses guide neural networks to learn rich latent data representations in fewer dimensions, converge in fewer training epochs, and achieve higher accuracy.
Paper Structure (16 sections, 3 theorems, 30 equations, 10 figures, 3 tables)

This paper contains 16 sections, 3 theorems, 30 equations, 10 figures, 3 tables.

Key Result

Theorem 2

(Cover2006, Corollary to Theorem 8.6.4) Let $X$ be a random variable valued in $\mathbb{R}^d$ and constant $W\in\mathbb{R}^{d\times d}$. If $W$ is invertible, then the entropy of $WX$ is

Figures (10)

  • Figure 1: Average change in entropy per filter in each layer of two VGG16 networks, one trained to classify ImageNet and one randomly initialized. The closed dots are means, box plots show first and third quartiles of entropy change per filter at each layer, and outliers are plotted as open dots.
  • Figure 2: This plot shows the comparison of $-\log(|x|)$ versus $\frac{1}{|x|}$.
  • Figure 3: MNIST Reconstruction MSE for Different Latent Dimensions and Dense Entropy Loss Coefficients $\lambda_1$
  • Figure 4: MNIST Stopping Epoch for Different Latent Dimensions and Entropy Loss Coefficients $\lambda_1$
  • Figure 5: CIFAR10 Reconstruction MSE for Different Latent Dimensions and Entropy Loss Coefficients $\lambda_1$
  • ...and 5 more figures

Theorems & Definitions (7)

  • Definition 1
  • Theorem 2
  • Theorem 3
  • proof
  • Example 4
  • Example 5
  • Corollary 6