Table of Contents
Fetching ...

The Impact of Geometric Complexity on Neural Collapse in Transfer Learning

Michael Munn, Benoit Dherin, Javier Gonzalvo

TL;DR

The paper proposes geometric complexity (GC) as a unifying mechanism linking loss-surface flatness, neural collapse (NC), and transfer learning. It proves and verifies that lower GC during pre-training induces stronger NC control, which in turn facilitates transfer, especially in few-shot settings; it also provides a robust, computable generalization bound expressed through GC. The work shows that GC can be estimated efficiently from data and that explicit GC regularization during pre-training yields tangible gains in downstream tasks. This framework offers a practical, geometry-aware perspective for improving pre-training and transfer performance in vision and related domains. It also discusses limitations in language modeling and outlines conditions under which the GC-NC-transfer relationships hold.

Abstract

Many of the recent remarkable advances in computer vision and language models can be attributed to the success of transfer learning via the pre-training of large foundation models. However, a theoretical framework which explains this empirical success is incomplete and remains an active area of research. Flatness of the loss surface and neural collapse have recently emerged as useful pre-training metrics which shed light on the implicit biases underlying pre-training. In this paper, we explore the geometric complexity of a model's learned representations as a fundamental mechanism that relates these two concepts. We show through experiments and theory that mechanisms which affect the geometric complexity of the pre-trained network also influence the neural collapse. Furthermore, we show how this effect of the geometric complexity generalizes to the neural collapse of new classes as well, thus encouraging better performance on downstream tasks, particularly in the few-shot setting.

The Impact of Geometric Complexity on Neural Collapse in Transfer Learning

TL;DR

The paper proposes geometric complexity (GC) as a unifying mechanism linking loss-surface flatness, neural collapse (NC), and transfer learning. It proves and verifies that lower GC during pre-training induces stronger NC control, which in turn facilitates transfer, especially in few-shot settings; it also provides a robust, computable generalization bound expressed through GC. The work shows that GC can be estimated efficiently from data and that explicit GC regularization during pre-training yields tangible gains in downstream tasks. This framework offers a practical, geometry-aware perspective for improving pre-training and transfer performance in vision and related domains. It also discusses limitations in language modeling and outlines conditions under which the GC-NC-transfer relationships hold.

Abstract

Many of the recent remarkable advances in computer vision and language models can be attributed to the success of transfer learning via the pre-training of large foundation models. However, a theoretical framework which explains this empirical success is incomplete and remains an active area of research. Flatness of the loss surface and neural collapse have recently emerged as useful pre-training metrics which shed light on the implicit biases underlying pre-training. In this paper, we explore the geometric complexity of a model's learned representations as a fundamental mechanism that relates these two concepts. We show through experiments and theory that mechanisms which affect the geometric complexity of the pre-trained network also influence the neural collapse. Furthermore, we show how this effect of the geometric complexity generalizes to the neural collapse of new classes as well, thus encouraging better performance on downstream tasks, particularly in the few-shot setting.
Paper Structure (33 sections, 4 theorems, 33 equations, 11 figures)

This paper contains 33 sections, 4 theorems, 33 equations, 11 figures.

Key Result

Proposition 4.1

Suppose that we have a balanced multi-class input-distribution $Q$ with $k$ classes satisfying the Poincaré inequality in equation:poincare for some constant $c$, then the geometric complexity of a network embedding $f$ bounds its neural collapse as measured by equation:neural_collapse; namely, we h where $k$ is the number of classes, and $d_{ij}$ is the distance between the mean of class $i$ and

Figures (11)

  • Figure 1: Controlling the neural collapse via the model geometric complexity for VGG-13 trained on CIFAR-10. Lower embedding $\mathop{\mathrm{GC}}\nolimits$ produces lower geometric collapse (Eq. \ref{['equation:geometric_collapse']}) and more neural collapse (i.e., lower $\mathop{\mathrm{NC}}\nolimits$) for Top row: increased learning rates, Middle row: decreased batch sizes, and Bottom row: increased L2 regularization.
  • Figure 2: The $\mathop{\mathrm{GC}}\nolimits$ computation is robust and consistent to sampling via Left: number of examples in the batch, Middle: number of elements in the Jacobian, or Right: by sampling the embedding dimension of the model. Here the $\mathop{\mathrm{GC}}\nolimits$ and subnet $\mathop{\mathrm{GC}}\nolimits$ have been computed over 20 trials, plotting the mean and standard deviations for a ResNet-18 model that has been trained to convergence on CIFAR-100. The true value of the $\mathop{\mathrm{GC}}\nolimits$ for each setting is indicated by dotted line.
  • Figure 3: VGG-13 trained on CIFAR-10 with 5 random seeds.
  • Figure 4: Controlling target neural collapse through source GC on CIFAR-FS with ResNet-18. Lower Source GC produces more neural collapse on target classes (i.e. lower target NC), and higher 5-shot transfer accuracy for Top row: increased learning rates, Middle row: decreased batch sizes, and Bottom row: increased L2 regularization.
  • Figure 5: VGG-13 on CIFAR-10. Training has reached terminal phase of training (TPT) with training accuracy equal to 1 (see first and second column). Lower $\text{GC}$ in training correlates with higher test accuracy in all settings (see third and fourth column).
  • ...and 6 more figures

Theorems & Definitions (9)

  • Proposition 4.1
  • Proposition 4.2
  • Proposition 4.3
  • proof
  • Proposition 5.1
  • proof
  • proof
  • Remark A.1
  • proof