On Dissipativity of Cross-Entropy Loss in Training ResNets

Jens Püttschneider; Timm Faulwasser

On Dissipativity of Cross-Entropy Loss in Training ResNets

Jens Püttschneider, Timm Faulwasser

TL;DR

The paper casts ResNet and neural ODE training as finite-horizon optimal control problems and develops a dissipativity-based analysis using a soft-cross-entropy regularization. It proves strict dissipativity with respect to a subspace of soft-cross-entropy minimizers, establishing a turnpike property that concentrates optimal trajectories near these minimizers for most of the horizon. Neural ODE extensions and equilibria-preserving discretizations are discussed, and the approach is validated on the two spirals and MNIST datasets, showing that training concentrates near per-class minimizer subspaces and enabling depth cropping. Overall, the framework provides a principled method to determine minimal necessary depth and to understand training dynamics through infinite-horizon-inspired concepts applied to finite-depth networks.

Abstract

The training of ResNets and neural ODEs can be formulated and analyzed from the perspective of optimal control. This paper proposes a dissipative formulation of the training of ResNets and neural ODEs for classification problems by including a variant of the cross-entropy as a regularization in the stage cost. Based on the dissipative formulation of the training, we prove that the trained ResNet exhibit the turnpike phenomenon. We then illustrate that the training exhibits the turnpike phenomenon by training on the two spirals and MNIST datasets. This can be used to find very shallow networks suitable for a given classification task.

On Dissipativity of Cross-Entropy Loss in Training ResNets

TL;DR

Abstract

Paper Structure (16 sections, 11 theorems, 66 equations, 5 figures, 1 table)

This paper contains 16 sections, 11 theorems, 66 equations, 5 figures, 1 table.

Introduction
Optimal Control and ResNet Training
Optimal Control Formulation of Deep Learning
Cross-Entropy Loss for Classification
Dissipativity of OCPs
Dissipativity of Cross-Entropy Loss in ResNets
Conceptual Difficulties of Standard Cross-Entropy
Soft Cross-Entropy and Its Properties
Dissipativity in ResNet Training
Turnpikes in ResNet Training
Neural ODEs and Other NN Architectures
Continuous-Time Training Formulation
Continuous-Time Results
Extension to Other NN Architectures
Numerical Experiments
...and 1 more sections

Key Result

Lemma 2

The stage cost eq:stagecost has no minimizers in $\mathbb{R}^{C\cdot D}$.

Figures (5)

Figure 1: Illustration of the soft cross-entropy and its minimizer set for two classes with the target class $y=1$.
Figure 2: Two Spirals dataset.
Figure 3: Evolution of the state trajectories for the two classes of the two spirals dataset.
Figure 4: State of the data trajectories int he last layer and the sets of soft-cross entropy minimizers for the two classes, $\mathbb{X}^\star_{1}$ and $\mathbb{X}^\star_{2}$.
Figure 5: The loss over the layers of the ResNet for the MNIST dataset in linear and logarithmic scale. The straight line represents the training loss and the dashed line represents the test loss.

Theorems & Definitions (26)

Definition 1: Strict dissipativity in discrete time
Lemma 2: No minimizers for cross-entropy
proof
Lemma 3: Minimizers of soft cross-entropy
proof
Remark 4: Large data with $\dim x > C$
Lemma 5: Invariance of soft cross-entropy
proof
Lemma 6: $T$ preserves the distance to $\mathbb{X}^\star_{y}$
proof
...and 16 more

On Dissipativity of Cross-Entropy Loss in Training ResNets

TL;DR

Abstract

On Dissipativity of Cross-Entropy Loss in Training ResNets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (26)