Table of Contents
Fetching ...

Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model

Chuang Ma, Tomoyuki Obuchi, Toshiyuki Tanaka

TL;DR

This work extends Neural Collapse (NC) theory to ordinal regression by integrating Cumulative Link Models (CLM) with Unconstrained Feature Models (UFM) to define Ordinal Neural Collapse (ONC). ONC comprises three properties: (i) within-class feature collapse to class means, (ii) those class means align with the classifier along a one-dimensional subspace, and (iii) latent logits $z_q^*$ are ordered by class, with a simple latent–threshold relation emerging in the zero-regularization limit for symmetric links. The authors derive an Equations Of State (EOS) that reveals a phase transition controlled by $(\lambda_h,\lambda_w)$ and show analytic and numerical results, including $z_q^* \to (b_q+b_{q-1})/2$ for symmetric $g$ and $w^* = \Theta((\lambda_h/\lambda_w)^{1/4})$ as $\lambda_h,\lambda_w\to 0$. Empirical validation on five imbalanced tabular OR datasets plus UTKFace demonstrates ONC under fixed thresholds, with fixed thresholds offering faster convergence and better minority-class accuracy; learnable thresholds still exhibit ONC, albeit with altered ONC3 behavior. These findings provide geometric insights and practical guidelines for OR tasks, highlighting how fixed-threshold designs can harness ONC to improve generalization and training efficiency.

Abstract

A phenomenon known as ''Neural Collapse (NC)'' in deep classification tasks, in which the penultimate-layer features and the final classifiers exhibit an extremely simple geometric structure, has recently attracted considerable attention, with the expectation that it can deepen our understanding of how deep neural networks behave. The Unconstrained Feature Model (UFM) has been proposed to explain NC theoretically, and there emerges a growing body of work that extends NC to tasks other than classification and leverages it for practical applications. In this study, we investigate whether a similar phenomenon arises in deep Ordinal Regression (OR) tasks, via combining the cumulative link model for OR and UFM. We show that a phenomenon we call Ordinal Neural Collapse (ONC) indeed emerges and is characterized by the following three properties: (ONC1) all optimal features in the same class collapse to their within-class mean when regularization is applied; (ONC2) these class means align with the classifier, meaning that they collapse onto a one-dimensional subspace; (ONC3) the optimal latent variables (corresponding to logits or preactivations in classification tasks) are aligned according to the class order, and in particular, in the zero-regularization limit, a highly local and simple geometric relationship emerges between the latent variables and the threshold values. We prove these properties analytically within the UFM framework with fixed threshold values and corroborate them empirically across a variety of datasets. We also discuss how these insights can be leveraged in OR, highlighting the use of fixed thresholds.

Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model

TL;DR

This work extends Neural Collapse (NC) theory to ordinal regression by integrating Cumulative Link Models (CLM) with Unconstrained Feature Models (UFM) to define Ordinal Neural Collapse (ONC). ONC comprises three properties: (i) within-class feature collapse to class means, (ii) those class means align with the classifier along a one-dimensional subspace, and (iii) latent logits are ordered by class, with a simple latent–threshold relation emerging in the zero-regularization limit for symmetric links. The authors derive an Equations Of State (EOS) that reveals a phase transition controlled by and show analytic and numerical results, including for symmetric and as . Empirical validation on five imbalanced tabular OR datasets plus UTKFace demonstrates ONC under fixed thresholds, with fixed thresholds offering faster convergence and better minority-class accuracy; learnable thresholds still exhibit ONC, albeit with altered ONC3 behavior. These findings provide geometric insights and practical guidelines for OR tasks, highlighting how fixed-threshold designs can harness ONC to improve generalization and training efficiency.

Abstract

A phenomenon known as ''Neural Collapse (NC)'' in deep classification tasks, in which the penultimate-layer features and the final classifiers exhibit an extremely simple geometric structure, has recently attracted considerable attention, with the expectation that it can deepen our understanding of how deep neural networks behave. The Unconstrained Feature Model (UFM) has been proposed to explain NC theoretically, and there emerges a growing body of work that extends NC to tasks other than classification and leverages it for practical applications. In this study, we investigate whether a similar phenomenon arises in deep Ordinal Regression (OR) tasks, via combining the cumulative link model for OR and UFM. We show that a phenomenon we call Ordinal Neural Collapse (ONC) indeed emerges and is characterized by the following three properties: (ONC1) all optimal features in the same class collapse to their within-class mean when regularization is applied; (ONC2) these class means align with the classifier, meaning that they collapse onto a one-dimensional subspace; (ONC3) the optimal latent variables (corresponding to logits or preactivations in classification tasks) are aligned according to the class order, and in particular, in the zero-regularization limit, a highly local and simple geometric relationship emerges between the latent variables and the threshold values. We prove these properties analytically within the UFM framework with fixed threshold values and corroborate them empirically across a variety of datasets. We also discuss how these insights can be leveraged in OR, highlighting the use of fixed thresholds.

Paper Structure

This paper contains 31 sections, 13 theorems, 62 equations, 39 figures, 4 tables.

Key Result

Theorem 4.1

Let $p(x)$ be a log-concave function on $\mathbb{R}$, and let $P(x)=\int_{-\infty}^xp(u)\,du$. Then, for any $a<b$, the function $\rho(x)=P(b-x)-P(a-x)$ is log-concave.

Figures (39)

  • Figure 1: Solution behavior of EOS in the logit model for $Q=3$ with $\bm{b} =(-10,-8,3,10)$ at $\lambda_h=1$. (Left) $w^*$ and $\bm{z} ^*$ are plotted against $\lambda_w$ on a linear scale. A clear phase transition appears at $\lambda_{w,c}=C/\lambda_h$ (vertical broken line), and the values of $\bm{z} ^*$ in the limit $\lambda_w \to 0$ match well with the theoretical prediction ($z_q^*=(b_q+b_{q-1})/2$). (Right) $w^*$ is plotted on a log-log scale in the small-$\lambda_w$ region. A power-law divergence with exponent $-1/4$, corresponding to the scaling $w^*=\Theta\bigl((\lambda_h/\lambda_w)^{1/4}\bigr)$ with fixed $\lambda_h$, is clearly observed.
  • Figure 2: Epoch-wise average metrics curves for the ER dataset with the logit model.
  • Figure 3: Epoch-wise average metrics curves for the UTKFace dataset with ResNet101 backbone.
  • Figure 4: Latent and feature space visualization for the ER dataset with the logit model.
  • Figure 5: Latent and feature space visualization for the UTKFace dataset with ResNet101.
  • ...and 34 more figures

Theorems & Definitions (23)

  • Theorem 4.1
  • proof
  • Theorem 4.2: ONC
  • proof
  • Theorem 4.3: EOS, phase transition, and some limiting behaviors
  • proof
  • Theorem A.1: Theorem 6 of Prekopa1973
  • Theorem A.2: Theorem 3 of Prekopa1973
  • Lemma 1
  • proof
  • ...and 13 more