Table of Contents
Fetching ...

LDReg: Local Dimensionality Regularized Self-Supervised Learning

Hanxun Huang, Ricardo J. G. B. Campello, Sarah Monazam Erfani, Xingjun Ma, Michael E. Houle, James Bailey

TL;DR

By increasing the local intrinsic dimensionality, it is demonstrated through a range of experiments that LDReg improves the representation quality of SSL and the results show that LDReg can regularize dimensionality at both local and global levels.

Abstract

Representations learned via self-supervised learning (SSL) can be susceptible to dimensional collapse, where the learned representation subspace is of extremely low dimensionality and thus fails to represent the full data distribution and modalities. Dimensional collapse also known as the "underfilling" phenomenon is one of the major causes of degraded performance on downstream tasks. Previous work has investigated the dimensional collapse problem of SSL at a global level. In this paper, we demonstrate that representations can span over high dimensional space globally, but collapse locally. To address this, we propose a method called $\textit{local dimensionality regularization (LDReg)}$. Our formulation is based on the derivation of the Fisher-Rao metric to compare and optimize local distance distributions at an asymptotically small radius for each data point. By increasing the local intrinsic dimensionality, we demonstrate through a range of experiments that LDReg improves the representation quality of SSL. The results also show that LDReg can regularize dimensionality at both local and global levels.

LDReg: Local Dimensionality Regularized Self-Supervised Learning

TL;DR

By increasing the local intrinsic dimensionality, it is demonstrated through a range of experiments that LDReg improves the representation quality of SSL and the results show that LDReg can regularize dimensionality at both local and global levels.

Abstract

Representations learned via self-supervised learning (SSL) can be susceptible to dimensional collapse, where the learned representation subspace is of extremely low dimensionality and thus fails to represent the full data distribution and modalities. Dimensional collapse also known as the "underfilling" phenomenon is one of the major causes of degraded performance on downstream tasks. Previous work has investigated the dimensional collapse problem of SSL at a global level. In this paper, we demonstrate that representations can span over high dimensional space globally, but collapse locally. To address this, we propose a method called . Our formulation is based on the derivation of the Fisher-Rao metric to compare and optimize local distance distributions at an asymptotically small radius for each data point. By increasing the local intrinsic dimensionality, we demonstrate through a range of experiments that LDReg improves the representation quality of SSL. The results also show that LDReg can regularize dimensionality at both local and global levels.
Paper Structure (31 sections, 8 theorems, 43 equations, 5 figures, 16 tables, 2 algorithms)

This paper contains 31 sections, 8 theorems, 43 equations, 5 figures, 16 tables, 2 algorithms.

Key Result

Theorem 1

If $F$ is continuously differentiable at $r$, then

Figures (5)

  • Figure 1: Illustrations with 2D synthetic data. (a-b) The LID value of the anchor point (red star) when there is (or is no) local collapse. (c) Fisher-Rao (FR) metric and mean LID (mLID) estimates. FR measures the distance between two LID distributions, and is computed based on our theoretical results. mLID is the geometric mean of sample-wise LID scores. High FR distances and low mLID scores indicate greater dimensional collapse. Global intrinsic dimension (GID) is estimated using the DanCO algorithm ceruti2014danco.
  • Figure 2: (a) Geometric mean of LID values over training epochs. (b) Geometric mean of LID values with varying color jitter strength in the augmentations for SimCLR. The linear evaluation result is reported in the legend. (a-b) LID is computed on the training set. (c-d) The effective rank and LID are computed for samples in the validation set. The solid and transparent bars represent the baseline method with and without LDReg regularization, respectively. MAE uses ViT-B as the encoder, and others use ResNet-50.
  • Figure 3: Each caption of the subfigures shows the desired local dimensionality and each title of the subfigures shows the estimated LID and global intrinsic dimensionality (GID). GID is estimated using the DanCO approach ceruti2014danco. mLID is the geometric mean of estimated sample LIDs.
  • Figure 4: t-SNE visualizations of the representations learned by different pretraining. Results are based on ResNet-50 with SimCLR with ImageNet validation set. Only the first 10 classes are selected for visualizations.
  • Figure 5: (a-b) Linear evaluation results and (c-d) effective ranks with varying $\beta$ and $k$. All models are trained on ImageNet for 100 epochs. The results are reported as linear probing accuracy (%) on ImageNet.

Theorems & Definitions (19)

  • Definition 1: houle2017local1
  • Theorem 1: houle2017local1
  • Definition 2
  • Lemma 1
  • Definition 3
  • Remark 1.1
  • Remark 1.2
  • Definition 4
  • Theorem 2
  • Corollary 2.1
  • ...and 9 more