Table of Contents
Fetching ...

Directional Neural Collapse Explains Few-Shot Transfer in Self-Supervised Learning

Achleshwar Luthra, Yash Salunkhe, Tomer Galanti

TL;DR

This work proves sharp non-asymptotic multiclass generalization bounds for downstream classification whose leading term is the directional CDNV, and links decision-axis collapse to multitask geometry: for independent balanced labelings, small directional CDNV across tasks forces the corresponding decision axes to be nearly orthogonal, helping a single representation support many tasks with minimal interference.

Abstract

Frozen self-supervised representations often transfer well with only a few labels across many semantic tasks. We argue that a single geometric quantity, \emph{directional} CDNV (decision-axis variance), sits at the core of two favorable behaviors: strong few-shot transfer within a task, and low interference across many tasks. We show that both emerge when variability \emph{along} class-separating directions is small. First, we prove sharp non-asymptotic multiclass generalization bounds for downstream classification whose leading term is the directional CDNV. The bounds include finite-shot corrections that cleanly separate intrinsic decision-axis variability from centroid-estimation error. Second, we link decision-axis collapse to multitask geometry: for independent balanced labelings, small directional CDNV across tasks forces the corresponding decision axes to be nearly orthogonal, helping a single representation support many tasks with minimal interference. Empirically, across SSL objectives, directional CDNV collapses during pretraining even when classical CDNV remains large, and our bounds closely track few-shot error at practical shot sizes. Additionally, on synthetic multitask data, we verify that SSL learns representations whose induced decision axes are nearly orthogonal. The code and project page of the paper are available at [\href{https://dlfundamentals.github.io/directional-neural-collapse/}{project page}].

Directional Neural Collapse Explains Few-Shot Transfer in Self-Supervised Learning

TL;DR

This work proves sharp non-asymptotic multiclass generalization bounds for downstream classification whose leading term is the directional CDNV, and links decision-axis collapse to multitask geometry: for independent balanced labelings, small directional CDNV across tasks forces the corresponding decision axes to be nearly orthogonal, helping a single representation support many tasks with minimal interference.

Abstract

Frozen self-supervised representations often transfer well with only a few labels across many semantic tasks. We argue that a single geometric quantity, \emph{directional} CDNV (decision-axis variance), sits at the core of two favorable behaviors: strong few-shot transfer within a task, and low interference across many tasks. We show that both emerge when variability \emph{along} class-separating directions is small. First, we prove sharp non-asymptotic multiclass generalization bounds for downstream classification whose leading term is the directional CDNV. The bounds include finite-shot corrections that cleanly separate intrinsic decision-axis variability from centroid-estimation error. Second, we link decision-axis collapse to multitask geometry: for independent balanced labelings, small directional CDNV across tasks forces the corresponding decision axes to be nearly orthogonal, helping a single representation support many tasks with minimal interference. Empirically, across SSL objectives, directional CDNV collapses during pretraining even when classical CDNV remains large, and our bounds closely track few-shot error at practical shot sizes. Additionally, on synthetic multitask data, we verify that SSL learns representations whose induced decision axes are nearly orthogonal. The code and project page of the paper are available at [\href{https://dlfundamentals.github.io/directional-neural-collapse/}{project page}].
Paper Structure (23 sections, 8 theorems, 52 equations, 6 figures)

This paper contains 23 sections, 8 theorems, 52 equations, 6 figures.

Key Result

Theorem 4.1

Let $C'\ge 2$ and $m\ge 10$ be integers. Fix a feature map $f:\mathcal{X}\to\mathbb R^{d}$ and class-conditional distributions $D_{1},\dots,D_{C'}$ over $\mathcal{X}$. Define $E^1_{ij}:=\frac{4}{m}(V_{ij}^2+\frac{1}{4} V_{ij})$, $E^2_{ij}:=\frac{V_{ij}}{m}$, $E^3_{ij}:=\frac{\Theta_{ij}+2(m-1)V_{ij}

Figures (6)

  • Figure 1: Directional collapse and multitask orthogonalization in SSL. Self-supervised pretraining suppresses within-class variance along class-separating directions (small directional CDNV) while leaving substantial variance in orthogonal, task-irrelevant subspaces. When directional CDNV is small for multiple independent labelings, the induced decision axes become nearly orthogonal, enabling a single representation to support many tasks with low interference.
  • Figure 2: Decision-axis collapse emerges during SSL training. We track both CDNV and directional CDNV on the training and test sets. Directional CDNV decreases much more than CDNV, indicating that SSL primarily tightens class geometry along separating directions even when overall within-class variability is large.
  • Figure 3: Decision-axis variance yields informative few-shot certificates in SSL. We plot few-shot NCC test error versus the number of shots per class, $m$, for several pretrained SSL encoders, together with certified upper bounds from our analysis. We compare our finite-$m$ bound to the directional-only $m\to\infty$ limit and to the bound of luthra2025selfsupervisedcontrastivelearningapproximately.
  • Figure 4: Decision-axis variance collapses while orthogonal variance remains large. Within-class variance decomposed into decision-axis and orthogonal components shows rapid collapse along the task-relevant direction despite persistently large orthogonal variance.
  • Figure 5: Multitask decision-axis orthogonalization in SSL. Median absolute cosine similarity (25–75% bands) between decision axes from different semantic labelings during training; alignments decay toward zero and are upper-bounded by our theory (dashed).
  • ...and 1 more figures

Theorems & Definitions (13)

  • Theorem 4.1
  • Proposition 4.1: Near-orthogonality from small directional CDNV
  • Proposition 3.1: Pairwise NCC error with tunable coefficients
  • proof
  • Theorem 3.2
  • proof : Proof of Thm. \ref{['thm:eq-wts']}
  • Theorem 3.2
  • proof : Proof of Thm. \ref{['thm:ncc-full-optimized']}
  • Proposition 3.2: Near-orthogonality from small directional CDNV
  • proof
  • ...and 3 more