Table of Contents
Fetching ...

Neural Collapse Dynamics: Depth, Activation, Regularisation, and Feature Norm Threshold

Anamika Paul Rupa

Abstract

Neural collapse (NC) -- the convergence of penultimate-layer features to a simplex equiangular tight frame -- is well understood at equilibrium, but the dynamics governing its onset remain poorly characterised. We identify a simple and predictive regularity: NC occurs when the mean feature norm reaches a model-dataset-specific critical value, fn*, that is largely invariant to training conditions. This value concentrates tightly within each (model, dataset) pair (CV < 8%); training dynamics primarily affect the rate at which fn approaches fn*, rather than the value itself. In standard training trajectories, the crossing of fn below fn* consistently precedes NC onset, providing a practical predictor with a mean lead time of 62 epochs (MAE 24 epochs). A direct intervention experiment confirms fn* is a stable attractor of the gradient flow -- perturbations to feature scale are self-corrected during training, with convergence to the same value regardless of direction (p>0.2). Completing the (architecture)x(dataset) grid reveals the paper's strongest result: ResNet-20 on MNIST gives fn* = 5.867 -- a +458% architecture effect versus only +68% on CIFAR-10. The grid is strongly non-additive; fn* cannot be decomposed into independent architecture and dataset contributions. Four structural regularities emerge: (1) depth has a non-monotonic effect on collapse speed; (2) activation jointly determines both collapse speed and fn*; (3) weight decay defines a three-regime phase diagram -- too little slows, an optimal range is fastest, and too much prevents collapse; (4) width monotonically accelerates collapse while shifting fn* by at most 13%. These results establish feature-norm dynamics as an actionable diagnostic for predicting NC timing, suggesting that norm-threshold behaviour is a general mechanism underlying delayed representational reorganisation in deep networks.

Neural Collapse Dynamics: Depth, Activation, Regularisation, and Feature Norm Threshold

Abstract

Neural collapse (NC) -- the convergence of penultimate-layer features to a simplex equiangular tight frame -- is well understood at equilibrium, but the dynamics governing its onset remain poorly characterised. We identify a simple and predictive regularity: NC occurs when the mean feature norm reaches a model-dataset-specific critical value, fn*, that is largely invariant to training conditions. This value concentrates tightly within each (model, dataset) pair (CV < 8%); training dynamics primarily affect the rate at which fn approaches fn*, rather than the value itself. In standard training trajectories, the crossing of fn below fn* consistently precedes NC onset, providing a practical predictor with a mean lead time of 62 epochs (MAE 24 epochs). A direct intervention experiment confirms fn* is a stable attractor of the gradient flow -- perturbations to feature scale are self-corrected during training, with convergence to the same value regardless of direction (p>0.2). Completing the (architecture)x(dataset) grid reveals the paper's strongest result: ResNet-20 on MNIST gives fn* = 5.867 -- a +458% architecture effect versus only +68% on CIFAR-10. The grid is strongly non-additive; fn* cannot be decomposed into independent architecture and dataset contributions. Four structural regularities emerge: (1) depth has a non-monotonic effect on collapse speed; (2) activation jointly determines both collapse speed and fn*; (3) weight decay defines a three-regime phase diagram -- too little slows, an optimal range is fastest, and too much prevents collapse; (4) width monotonically accelerates collapse while shifting fn* by at most 13%. These results establish feature-norm dynamics as an actionable diagnostic for predicting NC timing, suggesting that norm-threshold behaviour is a general mechanism underlying delayed representational reorganisation in deep networks.

Paper Structure

This paper contains 28 sections, 9 figures, 8 tables.

Figures (9)

  • Figure 1: MNIST baseline dynamics (MLP-5, ReLU, $\lambda=10^{-4}$). (a) Accuracy; terminal phase entered at epoch 10. (b) NC1 collapses at $T_{\mathrm{NC}}=310$; CE$\to$MSE switch at epoch 200. (c) NC2 and NC3 decline through Phase 2. (d) Feature norm compresses $25\times$ from ${\approx}27$ at epoch 10 of Phase 1 to $\mathrm{fn}=1.063$ at $T_{\mathrm{NC}}$.
  • Figure 2: CIFAR-10 ResNet-20 dynamics (3 seeds, SGD, $\lambda=10^{-3}$). (a) Accuracy; terminal phase entered around epoch 100. (b) NC1 collapses at $T_{\mathrm{NC}}=660$ for all seeds ($\mathrm{fn}=1.506$--$1.521$). (c) $\mathrm{fn}$ at collapse (dashed) is 43% above the MNIST MLP-5 value (dotted).
  • Figure 3: Depth effect (MLP, ReLU, $\lambda=10^{-4}$, MNIST). (a) $T_{\mathrm{NC}}$ vs depth: non-monotonic with depth-3 fastest. (b) $\mathrm{fn}^{*}$ vs depth: higher for shallow (NC1 $<0.05$), lower for deeper (NC1 $<0.01$). (c) Phase-2 fn trajectories (seed 0): all depths decay toward a characteristic value.
  • Figure 4: Activation effect (MLP-5, $\lambda=10^{-4}$, MNIST, 3 seeds each). Top row: NC1 trajectories. Bottom row: fn trajectories. Shaded bands indicate $\mathrm{fn}^{*} \pm \sigma$ across seeds. ReLU collapses at the lowest $\mathrm{fn}^{*} \approx 1.07$; Tanh at $\mathrm{fn}^{*} \approx 1.33$; GELU at $\mathrm{fn}^{*} \approx 2.13$ (CV $=21\%$).
  • Figure 5: Weight decay sweep (MLP-5, ReLU, MNIST). (a) $T_{\mathrm{NC}}$ shifts by up to 90 epochs across the three collapsing $\lambda$ values. (b) $\mathrm{fn}$ at $T_{\mathrm{NC}}$ clusters near grand mean 1.038 (CV $=6.4\%$, 95% CI: [0.998, 1.078]). (c) All 9 confirmed values lie within $\pm 1\sigma$.
  • ...and 4 more figures