Table of Contents
Fetching ...

Deep Neural Network Models Trained With A Fixed Random Classifier Transfer Better Across Domains

Hafiz Tiomoko Ali, Umberto Michieli, Ji Joong Moon, Daehyun Kim, Mete Ozay

TL;DR

The paper tackles transfer learning under domain shift by fixing the last-layer classifier to an Equiangular Tight Frame (ETF) geometry, inspired by Neural Collapse. It theoretically connects NC to linear random features and random projections, showing that with a linear activation the kernel’s covariance term vanishes ($d_2=0$, $d_1=1$) and class separation is maximized. Empirically, backbones pretrained with a fixed ETF classifier achieve superior out-of-domain transfer (up to about 19% gains over Switchable Whitening) across multiple datasets and architectures, with feature covariances ${f C}^ ext{circ}$ approaching zero. This approach offers a practical, covariance-minimizing pretraining strategy that enhances transfer learning and may inform foundation-model training strategies.

Abstract

The recently discovered Neural collapse (NC) phenomenon states that the last-layer weights of Deep Neural Networks (DNN), converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training. This ETF geometry is equivalent to vanishing within-class variability of the last layer activations. Inspired by NC properties, we explore in this paper the transferability of DNN models trained with their last layer weight fixed according to ETF. This enforces class separation by eliminating class covariance information, effectively providing implicit regularization. We show that DNN models trained with such a fixed classifier significantly improve transfer performance, particularly on out-of-domain datasets. On a broad range of fine-grained image classification datasets, our approach outperforms i) baseline methods that do not perform any covariance regularization (up to 22%), as well as ii) methods that explicitly whiten covariance of activations throughout training (up to 19%). Our findings suggest that DNNs trained with fixed ETF classifiers offer a powerful mechanism for improving transfer learning across domains.

Deep Neural Network Models Trained With A Fixed Random Classifier Transfer Better Across Domains

TL;DR

The paper tackles transfer learning under domain shift by fixing the last-layer classifier to an Equiangular Tight Frame (ETF) geometry, inspired by Neural Collapse. It theoretically connects NC to linear random features and random projections, showing that with a linear activation the kernel’s covariance term vanishes (, ) and class separation is maximized. Empirically, backbones pretrained with a fixed ETF classifier achieve superior out-of-domain transfer (up to about 19% gains over Switchable Whitening) across multiple datasets and architectures, with feature covariances approaching zero. This approach offers a practical, covariance-minimizing pretraining strategy that enhances transfer learning and may inform foundation-model training strategies.

Abstract

The recently discovered Neural collapse (NC) phenomenon states that the last-layer weights of Deep Neural Networks (DNN), converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training. This ETF geometry is equivalent to vanishing within-class variability of the last layer activations. Inspired by NC properties, we explore in this paper the transferability of DNN models trained with their last layer weight fixed according to ETF. This enforces class separation by eliminating class covariance information, effectively providing implicit regularization. We show that DNN models trained with such a fixed classifier significantly improve transfer performance, particularly on out-of-domain datasets. On a broad range of fine-grained image classification datasets, our approach outperforms i) baseline methods that do not perform any covariance regularization (up to 22%), as well as ii) methods that explicitly whiten covariance of activations throughout training (up to 19%). Our findings suggest that DNNs trained with fixed ETF classifiers offer a powerful mechanism for improving transfer learning across domains.
Paper Structure (6 sections, 1 equation, 3 figures, 1 table)

This paper contains 6 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: DNN models pretrained on a source domain $S$ with a classifier following the ETF geometry implicitly minimize class features variability. This translates into increased transferability to out-of-distribution target domains $T$.
  • Figure 2: Test MSE. Random feature regression with activation function $\sigma(t)=a_2t^2 + t$. Features extracted from the STL dataset using RN50 model pretrained on Imagenet.
  • Figure 3: Heatmap of the covariance matrices ${\bf C}^\circ$ of features obtained after transfer learning on the Flowers dataset using the RN101 model pretrained on the ImageNet dataset. $a)$ Trainable model with $\frac{1}{p} {\rm tr} {\bf C}^\circ = 0.16$, $b)$ SW with $\frac{1}{p} {\rm tr} {\bf C}^\circ = 0.04$ and $c)$ Fixed model with $\frac{1}{p} {\rm tr} {\bf C}^\circ = 0.01$.