Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance
Mackenzie J. Meni, Ryan T. White, Michael Mayo, Kevin Pilkiewicz
TL;DR
This work tackles how to regulate information flow in neural networks by tracking entropy changes through layers. It derives tractable formulas for entropy propagation such as $H(WX) = H(X) + log(|det W|)$ and $H(C*X) = H(X) + (l-p+1)(w-q+1) log(|c11|)$, and builds entropy-based losses $L_{dense}$ and $L_{conv}$ to steer training via $L = L_{acc} + abla\lambda_1 L_{dense} + \lambda_2 L_{conv}$. Empirical results on MNIST and CIFAR-10 for autoencoding and classification, plus large-scale CNNs and U-Net segmentation, show faster convergence (up to ~4x) and improved accuracy when entropy guidance is applied in alignment with observed information-flow patterns in well-trained networks. The findings offer a practical, scalable approach to information-theoretic regularization that can guide architecture design, training efficiency, and interpretability in computer vision.
Abstract
Neural networks have dramatically increased our capacity to learn from large, high-dimensional datasets across innumerable disciplines. However, their decisions are not easily interpretable, their computational costs are high, and building and training them are not straightforward processes. To add structure to these efforts, we derive new mathematical results to efficiently measure the changes in entropy as fully-connected and convolutional neural networks process data. By measuring the change in entropy as networks process data effectively, patterns critical to a well-performing network can be visualized and identified. Entropy-based loss terms are developed to improve dense and convolutional model accuracy and efficiency by promoting the ideal entropy patterns. Experiments in image compression, image classification, and image segmentation on benchmark datasets demonstrate these losses guide neural networks to learn rich latent data representations in fewer dimensions, converge in fewer training epochs, and achieve higher accuracy.
