Table of Contents
Fetching ...

Self-supervised Representation Learning From Random Data Projectors

Yi Sui, Tongzi Wu, Jesse C. Cresswell, Ga Wu, George Stein, Xiao Shi Huang, Xiaochen Zhang, Maksims Volkovs

TL;DR

This work tackles the limitation of augmentation-dependent self-supervised representation learning by proposing Learning from Randomness (LFR), a modality- and architecture-agnostic framework that learns useful representations without domain-specific augmentations or masking. LFR trains a representation model $f_\theta$ to predict outputs of multiple random projection functions $g^{(k)}$, using lightweight predictors $h^{(k)}_\phi$ and a batch-wise divergence objective derived from Batch-wise Barlow Twins. An EM-based training schedule and a diversity mechanism for selecting diverse random projectors via a Fast Determinantal Point Process underpin the method, enabling robust representations across image, time-series, and tabular data, with notable gains in medical datasets where augmentations are unsafe or ill-suited. The results indicate that learning from randomness is a viable, scalable alternative in SSRL, expanding applicability to domains with constrained or domain-specific augmentation strategies, and highlighting the importance of projector diversity and principled optimization in such setups.

Abstract

Self-supervised representation learning~(SSRL) has advanced considerably by exploiting the transformation invariance assumption under artificially designed data augmentations. While augmentation-based SSRL algorithms push the boundaries of performance in computer vision and natural language processing, they are often not directly applicable to other data modalities, and can conflict with application-specific data augmentation constraints. This paper presents an SSRL approach that can be applied to any data modality and network architecture because it does not rely on augmentations or masking. Specifically, we show that high-quality data representations can be learned by reconstructing random data projections. We evaluate the proposed approach on a wide range of representation learning tasks that span diverse modalities and real-world applications. We show that it outperforms multiple state-of-the-art SSRL baselines. Due to its wide applicability and strong empirical results, we argue that learning from randomness is a fruitful research direction worthy of attention and further study.

Self-supervised Representation Learning From Random Data Projectors

TL;DR

This work tackles the limitation of augmentation-dependent self-supervised representation learning by proposing Learning from Randomness (LFR), a modality- and architecture-agnostic framework that learns useful representations without domain-specific augmentations or masking. LFR trains a representation model to predict outputs of multiple random projection functions , using lightweight predictors and a batch-wise divergence objective derived from Batch-wise Barlow Twins. An EM-based training schedule and a diversity mechanism for selecting diverse random projectors via a Fast Determinantal Point Process underpin the method, enabling robust representations across image, time-series, and tabular data, with notable gains in medical datasets where augmentations are unsafe or ill-suited. The results indicate that learning from randomness is a viable, scalable alternative in SSRL, expanding applicability to domains with constrained or domain-specific augmentation strategies, and highlighting the importance of projector diversity and principled optimization in such setups.

Abstract

Self-supervised representation learning~(SSRL) has advanced considerably by exploiting the transformation invariance assumption under artificially designed data augmentations. While augmentation-based SSRL algorithms push the boundaries of performance in computer vision and natural language processing, they are often not directly applicable to other data modalities, and can conflict with application-specific data augmentation constraints. This paper presents an SSRL approach that can be applied to any data modality and network architecture because it does not rely on augmentations or masking. Specifically, we show that high-quality data representations can be learned by reconstructing random data projections. We evaluate the proposed approach on a wide range of representation learning tasks that span diverse modalities and real-world applications. We show that it outperforms multiple state-of-the-art SSRL baselines. Due to its wide applicability and strong empirical results, we argue that learning from randomness is a fruitful research direction worthy of attention and further study.
Paper Structure (41 sections, 14 equations, 8 figures, 11 tables, 2 algorithms)

This paper contains 41 sections, 14 equations, 8 figures, 11 tables, 2 algorithms.

Figures (8)

  • Figure 1: Top: H&E stained histopathology images have a characteristic appearance with blue tones indicating cell nuclei, while cytoplasm is stained pink chan2014wonderful. Bottom: Color jitter with the standard settings of chen2021exploring produces unrealistic augmentations with altered meanings. Choosing good augmentations requires domain knowledge shen2022randstainna.
  • Figure 2: Our proposed architecture for learning from randomness. An input $\mathbf{x}$ is encoded by $f_\theta$ into a useful representation $\mathbf{z}$, while also being fed to random projection functions $g^{(k)}$. Simple, learnable predictor functions $h^{(k)}_\phi$ try to match the outputs $\mathbf{y}^{(k)}$ from the projectors $g^{(k)}$, which is only possible when $\mathbf{z}$ contains rich information about the input.
  • Figure 3: Effect of target diversity
  • Figure 4: Test accuracy with different hyperparameters on Kvasir. Left: Number of random projectors. Middle: Batch size. Right: Predictor training setting.
  • Figure 5: Test accuracy with different embedding dimensions.
  • ...and 3 more figures