Table of Contents
Fetching ...

IConE: Batch Independent Collapse Prevention for Self-Supervised Representation Learning

Konstantinos Almpanakis, Anna Kreshuk

Abstract

Self-supervised learning (SSL) has revolutionized representation learning, with Joint-Embedding Architectures (JEAs) emerging as an effective approach for capturing semantic features. Existing JEAs rely on implicit or explicit batch interaction -- via negative sampling or statistical regularization -- to prevent representation collapse. This reliance becomes problematic in regimes where batch sizes must be small, such as high-dimensional scientific data, where memory constraints and class imbalance make large, well-balanced batches infeasible. We introduce IConE (Instance-Contrasted Embeddings), a framework that decouples collapse prevention from the training batch size. Rather than enforcing diversity through batch statistics, IConE maintains a global set of learnable auxiliary instance embeddings regularized by an explicit diversity objective. This transfers the anti-collapse mechanism from the transient batch to a dataset-level embedding space, allowing stable training even when batch statistics are unreliable, down to batch size 1. Across diverse 2D and 3D biomedical modalities, IConE outperforms strong contrastive and non-contrastive baselines throughout the small-batch regime (from B=1 to B=64) and demonstrates marked robustness to severe class imbalance. Geometric analysis shows that IConE preserves high intrinsic dimensionality in the learned representations, preventing the collapse observed in existing JEAs as batch sizes shrink.

IConE: Batch Independent Collapse Prevention for Self-Supervised Representation Learning

Abstract

Self-supervised learning (SSL) has revolutionized representation learning, with Joint-Embedding Architectures (JEAs) emerging as an effective approach for capturing semantic features. Existing JEAs rely on implicit or explicit batch interaction -- via negative sampling or statistical regularization -- to prevent representation collapse. This reliance becomes problematic in regimes where batch sizes must be small, such as high-dimensional scientific data, where memory constraints and class imbalance make large, well-balanced batches infeasible. We introduce IConE (Instance-Contrasted Embeddings), a framework that decouples collapse prevention from the training batch size. Rather than enforcing diversity through batch statistics, IConE maintains a global set of learnable auxiliary instance embeddings regularized by an explicit diversity objective. This transfers the anti-collapse mechanism from the transient batch to a dataset-level embedding space, allowing stable training even when batch statistics are unreliable, down to batch size 1. Across diverse 2D and 3D biomedical modalities, IConE outperforms strong contrastive and non-contrastive baselines throughout the small-batch regime (from B=1 to B=64) and demonstrates marked robustness to severe class imbalance. Geometric analysis shows that IConE preserves high intrinsic dimensionality in the learned representations, preventing the collapse observed in existing JEAs as batch sizes shrink.
Paper Structure (38 sections, 16 equations, 17 figures, 7 tables, 1 algorithm)

This paper contains 38 sections, 16 equations, 17 figures, 7 tables, 1 algorithm.

Figures (17)

  • Figure 1: Decoupling invariance and anti-collapse gradients with IConE.Left: Standard JEAs apply both the multi-view invariance objective ($\mathcal{L}_{\text{invariance}}$) and the anti-collapse mechanism ($\mathcal{L}_{\text{regularization}}$) directly within the encoder's dynamic representation space $\vec{Z}$. This couples the anti-collapse constraint to the current batch. Right: IConE transfers the anti-collapse mechanism to an explicitly different parameter space. A persistent embedding table yields instance-specific anchors $\vec{E}$, which are independently structured via $\mathcal{L}_{\text{regularization}}$. The encoder $f$ is optimized purely through attractive forces: local view consistency ($\mathcal{L}_{\text{invariance}}$) and alignment to its regularized target ($\mathcal{L}_{\text{alignment}}$). This structural decoupling removes the need for batch-dependent negative sampling or variance statistics, applying the repulsive force to the entire dataset at the same time.
  • Figure 2: Batch-size stability. Mean and standard deviation linear-probe top-1 balanced accuracy aggregated over (A) 2D datasets and (B) 3D datasets.
  • Figure 3: Batch sensitivity analysis.Left: Performance drop from largest to smallest batch size (lower is better). Right: Correlation between batch size and performance versus absolute top-1 balanced accuracy. IConE occupies the desirable top-left region: high performance with minimal batch dependence.
  • Figure 4: UMAP visualization of learned representations. Embeddings for OrganMNIST3D across batch sizes (rows) and methods (columns). IConE (leftmost column, green border) maintains well-separated class clusters at all batch sizes, while batch-dependent methods show progressive collapse as batch size decreases.
  • Figure 5: Representation geometry analysis.(A) Alignment vs. uniformity loss for 3D datasets, colored by downstream accuracy. (B) Accuracy vs. effective rank (3D). (C) RankMe vs. batch size (2D). (D) LiDAR vs. batch size (2D). IConE achieves superior geometry metrics across all measures.
  • ...and 12 more figures