Self-supervised Representation Learning From Random Data Projectors

Yi Sui; Tongzi Wu; Jesse C. Cresswell; Ga Wu; George Stein; Xiao Shi Huang; Xiaochen Zhang; Maksims Volkovs

Self-supervised Representation Learning From Random Data Projectors

Yi Sui, Tongzi Wu, Jesse C. Cresswell, Ga Wu, George Stein, Xiao Shi Huang, Xiaochen Zhang, Maksims Volkovs

TL;DR

This work tackles the limitation of augmentation-dependent self-supervised representation learning by proposing Learning from Randomness (LFR), a modality- and architecture-agnostic framework that learns useful representations without domain-specific augmentations or masking. LFR trains a representation model $f_\theta$ to predict outputs of multiple random projection functions $g^{(k)}$, using lightweight predictors $h^{(k)}_\phi$ and a batch-wise divergence objective derived from Batch-wise Barlow Twins. An EM-based training schedule and a diversity mechanism for selecting diverse random projectors via a Fast Determinantal Point Process underpin the method, enabling robust representations across image, time-series, and tabular data, with notable gains in medical datasets where augmentations are unsafe or ill-suited. The results indicate that learning from randomness is a viable, scalable alternative in SSRL, expanding applicability to domains with constrained or domain-specific augmentation strategies, and highlighting the importance of projector diversity and principled optimization in such setups.

Abstract

Self-supervised representation learning~(SSRL) has advanced considerably by exploiting the transformation invariance assumption under artificially designed data augmentations. While augmentation-based SSRL algorithms push the boundaries of performance in computer vision and natural language processing, they are often not directly applicable to other data modalities, and can conflict with application-specific data augmentation constraints. This paper presents an SSRL approach that can be applied to any data modality and network architecture because it does not rely on augmentations or masking. Specifically, we show that high-quality data representations can be learned by reconstructing random data projections. We evaluate the proposed approach on a wide range of representation learning tasks that span diverse modalities and real-world applications. We show that it outperforms multiple state-of-the-art SSRL baselines. Due to its wide applicability and strong empirical results, we argue that learning from randomness is a fruitful research direction worthy of attention and further study.

Self-supervised Representation Learning From Random Data Projectors

TL;DR

to predict outputs of multiple random projection functions

, using lightweight predictors

and a batch-wise divergence objective derived from Batch-wise Barlow Twins. An EM-based training schedule and a diversity mechanism for selecting diverse random projectors via a Fast Determinantal Point Process underpin the method, enabling robust representations across image, time-series, and tabular data, with notable gains in medical datasets where augmentations are unsafe or ill-suited. The results indicate that learning from randomness is a viable, scalable alternative in SSRL, expanding applicability to domains with constrained or domain-specific augmentation strategies, and highlighting the importance of projector diversity and principled optimization in such setups.

Abstract

Paper Structure (41 sections, 14 equations, 8 figures, 11 tables, 2 algorithms)

This paper contains 41 sections, 14 equations, 8 figures, 11 tables, 2 algorithms.

Introduction
Background and Related Work
Representation Learning from Random Data Projectors
Pretext Task: Multi-objective Learning from Randomness
Divergence Measure: Batch-wise Barlow Twins
Diversity Encouragement on Random Data Projectors
Experiments and Evaluation
Datasets
Implementations
Performance Across Data Modalities
Performance on Medical Applications
Impact of Random Data Projector Diversity
Ablation Study
Conclusion
Reproducibility Statement
...and 26 more sections

Figures (8)

Figure 1: Top: H&E stained histopathology images have a characteristic appearance with blue tones indicating cell nuclei, while cytoplasm is stained pink chan2014wonderful. Bottom: Color jitter with the standard settings of chen2021exploring produces unrealistic augmentations with altered meanings. Choosing good augmentations requires domain knowledge shen2022randstainna.
Figure 2: Our proposed architecture for learning from randomness. An input $\mathbf{x}$ is encoded by $f_\theta$ into a useful representation $\mathbf{z}$, while also being fed to random projection functions $g^{(k)}$. Simple, learnable predictor functions $h^{(k)}_\phi$ try to match the outputs $\mathbf{y}^{(k)}$ from the projectors $g^{(k)}$, which is only possible when $\mathbf{z}$ contains rich information about the input.
Figure 3: Effect of target diversity
Figure 4: Test accuracy with different hyperparameters on Kvasir. Left: Number of random projectors. Middle: Batch size. Right: Predictor training setting.
Figure 5: Test accuracy with different embedding dimensions.
...and 3 more figures

Self-supervised Representation Learning From Random Data Projectors

TL;DR

Abstract

Self-supervised Representation Learning From Random Data Projectors

Authors

TL;DR

Abstract

Table of Contents

Figures (8)