Mathematical Modeling and Convergence Analysis of Deep Neural Networks with Dense Layer Connectivities in Deep Learning
Jinshu Huang, Haibin Su, Xue-Cheng Tai, Chunlin Wu
TL;DR
This work develops a dense non-local (DNL) framework to model densely connected DNNs as nonlinear integral equations in the deep-layer limit. It casts training as an optimal-control problem and proves a Γ-convergence-based result: the discrete-time learning problems converge to a well-posed continuous-time counterpart, with convergence of optimal values and subsequences of minimizers. The approach unifies various architectures (e.g., DenseNets, self-attention variants) under a common integral-equation formulation and provides a theoretical justification for the observed stability of densely connected, deep models. Experimental results on DenseNet-like configurations corroborate the theory, showing training losses decrease with increasing depth, consistent with the convergence analysis.
Abstract
In deep learning, dense layer connectivity has become a key design principle in deep neural networks (DNNs), enabling efficient information flow and strong performance across a range of applications. In this work, we model densely connected DNNs mathematically and analyze their learning problems in the deep-layer limit. For a broad applicability, we present our analysis in a framework setting of DNNs with densely connected layers and general non-local feature transformations (with local feature transformations as special cases) within layers, which is called dense non-local (DNL) framework and includes standard DenseNets and variants as special examples. In this formulation, the densely connected networks are modeled as nonlinear integral equations, in contrast to the ordinary differential equation viewpoint commonly adopted in prior works. We study the associated training problems from an optimal control perspective and prove convergence results from the network learning problem to its continuous-time counterpart. In particular, we show the convergence of optimal values and the subsequence convergence of minimizers, using a piecewise linear extension and $Γ$-convergence analysis. Our results provide a mathematical foundation for understanding densely connected DNNs and further suggest that such architectures can offer stability of training deep models.
