Table of Contents
Fetching ...

Progressive Feedforward Collapse of ResNet Training

Sicong Wang, Kuo Gai, Shihua Zhang

TL;DR

This work investigates the geometry of intermediate ResNet features beyond neural collapse (NC) and proposes Progressive Feedforward Collapse (PFC), a layerwise strengthening of collapse along forward propagation. By embedding forward paths in a Wasserstein-geodesic framework under weight decay, the authors prove monotonic decreases of PFC metrics under mild assumptions and validate PFC empirically across multiple datasets. To bridge NC with data-aware structure, they introduce the multilayer unconstrained feature model (MUFM), which couples all layer features via an optimal-transport regularizer and reveals a trade-off between aligning to the input data and achieving an ETF-like decomposition. Collectively, these contributions extend NC to intermediate layers, provide a theoretical lens on ResNet's forward propagation, and offer a data-driven surrogate that complements UFM for understanding classification dynamics.

Abstract

Neural collapse (NC) is a simple and symmetric phenomenon for deep neural networks (DNNs) at the terminal phase of training, where the last-layer features collapse to their class means and form a simplex equiangular tight frame aligning with the classifier vectors. However, the relationship of the last-layer features to the data and intermediate layers during training remains unexplored. To this end, we characterize the geometry of intermediate layers of ResNet and propose a novel conjecture, progressive feedforward collapse (PFC), claiming the degree of collapse increases during the forward propagation of DNNs. We derive a transparent model for the well-trained ResNet according to that ResNet with weight decay approximates the geodesic curve in Wasserstein space at the terminal phase. The metrics of PFC indeed monotonically decrease across depth on various datasets. We propose a new surrogate model, multilayer unconstrained feature model (MUFM), connecting intermediate layers by an optimal transport regularizer. The optimal solution of MUFM is inconsistent with NC but is more concentrated relative to the input data. Overall, this study extends NC to PFC to model the collapse phenomenon of intermediate layers and its dependence on the input data, shedding light on the theoretical understanding of ResNet in classification problems.

Progressive Feedforward Collapse of ResNet Training

TL;DR

This work investigates the geometry of intermediate ResNet features beyond neural collapse (NC) and proposes Progressive Feedforward Collapse (PFC), a layerwise strengthening of collapse along forward propagation. By embedding forward paths in a Wasserstein-geodesic framework under weight decay, the authors prove monotonic decreases of PFC metrics under mild assumptions and validate PFC empirically across multiple datasets. To bridge NC with data-aware structure, they introduce the multilayer unconstrained feature model (MUFM), which couples all layer features via an optimal-transport regularizer and reveals a trade-off between aligning to the input data and achieving an ETF-like decomposition. Collectively, these contributions extend NC to intermediate layers, provide a theoretical lens on ResNet's forward propagation, and offer a data-driven surrogate that complements UFM for understanding classification dynamics.

Abstract

Neural collapse (NC) is a simple and symmetric phenomenon for deep neural networks (DNNs) at the terminal phase of training, where the last-layer features collapse to their class means and form a simplex equiangular tight frame aligning with the classifier vectors. However, the relationship of the last-layer features to the data and intermediate layers during training remains unexplored. To this end, we characterize the geometry of intermediate layers of ResNet and propose a novel conjecture, progressive feedforward collapse (PFC), claiming the degree of collapse increases during the forward propagation of DNNs. We derive a transparent model for the well-trained ResNet according to that ResNet with weight decay approximates the geodesic curve in Wasserstein space at the terminal phase. The metrics of PFC indeed monotonically decrease across depth on various datasets. We propose a new surrogate model, multilayer unconstrained feature model (MUFM), connecting intermediate layers by an optimal transport regularizer. The optimal solution of MUFM is inconsistent with NC but is more concentrated relative to the input data. Overall, this study extends NC to PFC to model the collapse phenomenon of intermediate layers and its dependence on the input data, shedding light on the theoretical understanding of ResNet in classification problems.
Paper Structure (17 sections, 6 theorems, 55 equations, 5 figures, 1 table)

This paper contains 17 sections, 6 theorems, 55 equations, 5 figures, 1 table.

Key Result

Proposition 1

(Benamou-Brenier formulabenamou1999numerical) Let $\mu_0$, $\mu_1$$\in$$\mathcal{P}\left(\mathbb{R}^d\right)$. Then it holds where the infimum is taken among all weakly continuous distributional solutions of the continuity equation $\left(\mu_t, v_t\right)$ connecting $\mu_0$ and $\mu_1$.

Figures (5)

  • Figure 1: Illustration of PFC. This $2$D example represents a three-class dataset, each class depicted by a different color, consisting of four data points per class. Under the geodesic curve assumption, these points undergo uniform linear motion during forward propagation in ResNet, whose color gradually darkens. The arrow indicates the direction of the motion. (a): Trajectory of features for each class during forward propagation in ResNet. The dotted line shows the motion trajectory. (b):Left: Distribution of input. Middle: Distribution of the intermediate layer. Right: Distribution of the last layer, exhibiting NC. Top: Trajectory of features during forward propagation in ResNet. The features progressively collapse to their respective class means. Bottom: Trajectory of (centered) class means. The class means, centered at the global mean, progressively collapse to the simplex ETF (see Definition \ref{['ETF']}).
  • Figure 2: Features of ResNet exhibiting PFC on various datasets. In each row, the PFC metrics of different layers on MNIST, Fashion MNIST, CIFAR10, STL10, and CIFAR100 are plotted (a-e). The 'layer0' represents the input features.
  • Figure 3: PFC on various datasets at the final epoch. The green points are calculated by the features of different layers and connected by a green line. The blue curve represents the predicted values under the geodesic curve assumption. The abscissa represents the relative position where $0$ corresponds to the input features. Both observations and predictions are monotonic on various datasets (a)-(e) correspond to the same datasets as in Figure \ref{['PFC1']}.
  • Figure 4: The difference between MUFM and UFM in terms of global optimality.(a): The collapse degree of the solution of MUFM, measured by PFC metrics. The red line is the collapse degree of data. (b): The collapse degree of the solution of UFM, measured by PFC metrics (log-scale). The solution of UFM exhibits NC but MUFM doesn't.
  • Figure 5: Trade-off in MUFM. The abscissa is the ratio $\dfrac{\lambda}{\lambda_{\boldsymbol{W}}}$, we keep the value of $\lambda_{\boldsymbol{W}}$ and only vary $\lambda$. The first two figures show that increasing $\lambda$ makes the solution of MUFM away from the NC solution. The last figure shows that increasing $\lambda$ makes the solution of MUFM close to the data.

Theorems & Definitions (8)

  • Definition 1: Simplex ETF
  • Proposition 1
  • Conjecture 1: Progressive Feedforward Collapse conjecture
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • lemma 1