Table of Contents
Fetching ...

Mathematical Modeling and Convergence Analysis of Deep Neural Networks with Dense Layer Connectivities in Deep Learning

Jinshu Huang, Haibin Su, Xue-Cheng Tai, Chunlin Wu

TL;DR

This work develops a dense non-local (DNL) framework to model densely connected DNNs as nonlinear integral equations in the deep-layer limit. It casts training as an optimal-control problem and proves a Γ-convergence-based result: the discrete-time learning problems converge to a well-posed continuous-time counterpart, with convergence of optimal values and subsequences of minimizers. The approach unifies various architectures (e.g., DenseNets, self-attention variants) under a common integral-equation formulation and provides a theoretical justification for the observed stability of densely connected, deep models. Experimental results on DenseNet-like configurations corroborate the theory, showing training losses decrease with increasing depth, consistent with the convergence analysis.

Abstract

In deep learning, dense layer connectivity has become a key design principle in deep neural networks (DNNs), enabling efficient information flow and strong performance across a range of applications. In this work, we model densely connected DNNs mathematically and analyze their learning problems in the deep-layer limit. For a broad applicability, we present our analysis in a framework setting of DNNs with densely connected layers and general non-local feature transformations (with local feature transformations as special cases) within layers, which is called dense non-local (DNL) framework and includes standard DenseNets and variants as special examples. In this formulation, the densely connected networks are modeled as nonlinear integral equations, in contrast to the ordinary differential equation viewpoint commonly adopted in prior works. We study the associated training problems from an optimal control perspective and prove convergence results from the network learning problem to its continuous-time counterpart. In particular, we show the convergence of optimal values and the subsequence convergence of minimizers, using a piecewise linear extension and $Γ$-convergence analysis. Our results provide a mathematical foundation for understanding densely connected DNNs and further suggest that such architectures can offer stability of training deep models.

Mathematical Modeling and Convergence Analysis of Deep Neural Networks with Dense Layer Connectivities in Deep Learning

TL;DR

This work develops a dense non-local (DNL) framework to model densely connected DNNs as nonlinear integral equations in the deep-layer limit. It casts training as an optimal-control problem and proves a Γ-convergence-based result: the discrete-time learning problems converge to a well-posed continuous-time counterpart, with convergence of optimal values and subsequences of minimizers. The approach unifies various architectures (e.g., DenseNets, self-attention variants) under a common integral-equation formulation and provides a theoretical justification for the observed stability of densely connected, deep models. Experimental results on DenseNet-like configurations corroborate the theory, showing training losses decrease with increasing depth, consistent with the convergence analysis.

Abstract

In deep learning, dense layer connectivity has become a key design principle in deep neural networks (DNNs), enabling efficient information flow and strong performance across a range of applications. In this work, we model densely connected DNNs mathematically and analyze their learning problems in the deep-layer limit. For a broad applicability, we present our analysis in a framework setting of DNNs with densely connected layers and general non-local feature transformations (with local feature transformations as special cases) within layers, which is called dense non-local (DNL) framework and includes standard DenseNets and variants as special examples. In this formulation, the densely connected networks are modeled as nonlinear integral equations, in contrast to the ordinary differential equation viewpoint commonly adopted in prior works. We study the associated training problems from an optimal control perspective and prove convergence results from the network learning problem to its continuous-time counterpart. In particular, we show the convergence of optimal values and the subsequence convergence of minimizers, using a piecewise linear extension and -convergence analysis. Our results provide a mathematical foundation for understanding densely connected DNNs and further suggest that such architectures can offer stability of training deep models.

Paper Structure

This paper contains 9 sections, 13 theorems, 87 equations, 3 figures.

Key Result

Theorem 1

(Convergence from the discrete to continuous learning problem) Consider the problems $(\mathcal{P}_{L}), (\mathcal{P})$ defined in (discrete-time-control problem P_L) and (continuous-time: control problem P), respectively. Let the parameter sets $\Omega_{\Theta;L}$ and $\Omega_{\pmb{\Theta}}$ be giv Let $\pmb{\Theta}^*_L: ={\hat{\pmb{\mathcal{I}}}}_{L}\Theta^*_{L}$, $L\ge 1$. Then $\{\pmb{\Theta}^

Figures (3)

  • Figure 1: Network architecture associated with \ref{['equation: DNLF']}.
  • Figure 2: An illustration of the $\bold{flip}(\cdot)$ operation. The left and right pictures show the parameters $\bold{W}_3$ and $\bold{flip}(\bold{W}_3)$, respectively. The elements are arranged on the grid points according to their superscripts. The gray arrow represents the "copy" operation.
  • Figure 3: The plot of training loss v.s. epoch number for the DenseNet with different layer numbers $L$ on the SVHN and CIFAR10 datasets. The training losses decrease with increasing $L$, closely matching the theoretical convergent result.

Theorems & Definitions (30)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Remark 1
  • Theorem 1
  • Lemma 2
  • proof
  • Proposition 3
  • proof
  • ...and 20 more