The Persistence of Neural Collapse Despite Low-Rank Bias
Connall Garrod, Jonathan P. Keating
TL;DR
The work investigates why neural collapse (NC) and its deep variant (DNC) arise in trained classifiers, focusing on deep unconstrained feature models (UFM) under cross-entropy loss. It provides a global analysis showing that high-rank DNC is not generally optimal as network depth grows, exposing a low-rank bias that constrains the singular values of the optimal output $Z$ and shapes the loss landscape. The study proves that, for deep linear UFMs, global minima favor diagonally superior, low-rank structures and that DNC can persist as a local minimum or critical point with vanishing gradients or PSD Hessian when regularization is small. Extending to deep ReLU UFMs, the results hold under reasonable assumptions, confirming the persistence of low-rank bias across nonlinearities and providing theoretical foundations for the empirical observation that DNC often appears despite suboptimality. Overall, the paper offers the first comprehensive theoretical framework linking low-rank bias to the prevalence of DNC, with implications for how optimization dynamics and architecture influence feature- and weight-space geometry in deep networks.
Abstract
Neural collapse (NC) and its multi-layer variant, deep neural collapse (DNC), describe a structured geometry that occurs in the features and weights of trained deep networks. Recent theoretical work by Sukenik et al. using a deep unconstrained feature model (UFM) suggests that DNC is suboptimal under mean squared error (MSE) loss. They heuristically argue that this is due to low-rank bias induced by L2 regularization. In this work, we extend this result to deep UFMs trained with cross-entropy loss, showing that high-rank structures, including DNC, are not generally optimal. We characterize the associated low-rank bias, proving a fixed bound on the number of non-negligible singular values at global minima as network depth increases. We further analyze the loss surface, demonstrating that DNC is more prevalent in the landscape than other critical configurations, which we argue explains its frequent empirical appearance. Our results are validated through experiments in deep UFMs and deep neural networks.
