On Dissipativity of Cross-Entropy Loss in Training ResNets
Jens Püttschneider, Timm Faulwasser
TL;DR
The paper casts ResNet and neural ODE training as finite-horizon optimal control problems and develops a dissipativity-based analysis using a soft-cross-entropy regularization. It proves strict dissipativity with respect to a subspace of soft-cross-entropy minimizers, establishing a turnpike property that concentrates optimal trajectories near these minimizers for most of the horizon. Neural ODE extensions and equilibria-preserving discretizations are discussed, and the approach is validated on the two spirals and MNIST datasets, showing that training concentrates near per-class minimizer subspaces and enabling depth cropping. Overall, the framework provides a principled method to determine minimal necessary depth and to understand training dynamics through infinite-horizon-inspired concepts applied to finite-depth networks.
Abstract
The training of ResNets and neural ODEs can be formulated and analyzed from the perspective of optimal control. This paper proposes a dissipative formulation of the training of ResNets and neural ODEs for classification problems by including a variant of the cross-entropy as a regularization in the stage cost. Based on the dissipative formulation of the training, we prove that the trained ResNet exhibit the turnpike phenomenon. We then illustrate that the training exhibits the turnpike phenomenon by training on the two spirals and MNIST datasets. This can be used to find very shallow networks suitable for a given classification task.
