An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture

C G Krishnanunni; Tan Bui-Thanh

An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture

C G Krishnanunni, Tan Bui-Thanh

TL;DR

By equipping the physics-informed neural network (PINN) with the proposed adaptive architecture strategy to solve partial differential equations, it is numerically show that adaptive PINNs not only are superior to standard PINNs but also produce interpretable hidden layers with provable stability.

Abstract

This work presents a two-stage adaptive framework for progressively developing deep neural network (DNN) architectures that generalize well for a given training data set. In the first stage, a layerwise training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers. We impose desirable structures on the DNN by employing manifold regularization, sparsity regularization, and physics-informed terms. We introduce a epsilon-delta stability-promoting concept as a desirable property for a learning algorithm and show that employing manifold regularization yields a epsilon-delta stability-promoting algorithm. Further, we also derive the necessary conditions for the trainability of a newly added layer and investigate the training saturation problem. In the second stage of the algorithm (post-processing), a sequence of shallow networks is employed to extract information from the residual produced in the first stage, thereby improving the prediction accuracy. Numerical investigations on prototype regression and classification problems demonstrate that the proposed approach can outperform fully connected DNNs of the same size. Moreover, by equipping the physics-informed neural network (PINN) with the proposed adaptive architecture strategy to solve partial differential equations, we numerically show that adaptive PINNs not only are superior to standard PINNs but also produce interpretable hidden layers with provable stability. We also apply our architecture design strategy to solve inverse problems governed by elliptic partial differential equations.

An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture

TL;DR

Abstract

Paper Structure (41 sections, 6 theorems, 67 equations, 13 figures, 10 tables, 1 algorithm)

This paper contains 41 sections, 6 theorems, 67 equations, 13 figures, 10 tables, 1 algorithm.

Introduction
Related work
Our contributions
Proposed methodology
Layerwise training strategy (\ref{['AlgoGreedyLayerwiseResNet']})
Sequential residual learning strategy (\ref{['Sequential']})
An analysis of \ref{['AlgoGreedyLayerwiseResNet']} and \ref{['Sequential']}
Layerwise training \ref{['AlgoGreedyLayerwiseResNet']}
$\delta-$stability in layerwise training \ref{['AlgoGreedyLayerwiseResNet']}
Proof:
Proof:
Proof:
Layerwise training saturation problem
Proof:
Proof:
...and 26 more sections

Key Result

Lemma 3.8

\newlabelupper_hemicontinuity0 Consider the loss renormalized_loss associated with training the layer ${\mathcal{N}}^{(i)}$. For a given $\{ \alpha^{(i)},\ \tau^{(i)}\}$ we define the loss function renormalized_loss as $\Phi\left( \boldsymbol{\theta},\ \gamma^{(i)} \right)$, where $\boldsymbol{\th

Figures (13)

Figure 1: Schematic of \ref{['AlgoGreedyLayerwiseResNet']}: Training the $(i+1)^{th}$ hidden layer.
Figure 1: Left to right: layerwise training curve on Boston house price prediction problem by \ref{['AlgoGreedyLayerwiseResNet']}; Importance of manifold regularization ($\gamma$) in \ref{['AlgoGreedyLayerwiseResNet']}; Active and inactive parameters in each hidden layer.
Figure 1: Transfer learning strategy. Left to right: Transfer learning on interpretable network achieved by our approach (relative $L^2$ error= $4.432\times 10^{-3}$ at the end of $100$ epochs); True solution; Traditional transfer learning strategy on a baseline (relative $L^2$ error= $8.72\times 10^{-3}$ at the end of $100$ epochs).
Figure 1: Left to right: Active and inactive parameters in each hidden layer (20 neurons in each hidden layer); Active and inactive parameters in each hidden layer (500 neurons in each hidden layer)
Figure 2: Boston house price prediction problem. Comparison between the proposed approach and other methods. As can be seen, the proposed two-stage approach is the most accurate (by a large margin) with the least number of network parameters.
...and 8 more figures

Theorems & Definitions (22)

Definition 3.1: Input space and neural transfer map
Definition 3.2: $\varepsilon-\delta$ stability promoting algorithm
Remark 3.3
Definition 3.4: Discrete $\varepsilon-\delta$ stability promoting algorithm
Remark 3.5
Definition 3.6: $\delta-$stable function
Remark 3.7
Lemma 3.8: Set of minimizers $\boldsymbol{\theta}^*\left( \gamma^{(i)} \right)$ is upper hemicontinuous with respect to $\gamma^{(i)}$
Proposition 3.9: $\varepsilon-\delta$ stability promoting algorithm via manifold regularization
Remark 3.10
...and 12 more

An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture

TL;DR

Abstract

An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (22)