Table of Contents
Fetching ...

URLOST: Unsupervised Representation Learning without Stationarity or Topology

Zeyu Yun, Juexiao Zhang, Yann LeCun, Yubei Chen

TL;DR

URLOST addresses unsupervised representation learning for high-dimensional signals without assuming known topology or stationarity. It fuses Density Adjusted Spectral Clustering (with $A_{ij} = I(S_i; S_j)$, $L = D - A$, $P = \mathrm{diag}(p(i))$, $p(i) = q(i)^{\alpha} n(i)^{-{\beta}}$, and the objective $\min_{YY^T = I} \mathrm{tr}(Y P^{1/2} L P^{1/2} Y^T)$), a learnable Self-organizing Layer (with $z_0 = [g(x^{(1)}, w^{(1)}), \cdots, g(x^{(M)}, w^{(M)})]$), and a Masked Autoencoder to learn representations from masked clusters. Empirically, URLOST outperforms SimCLR and MAE across a synthetic retinal sampling CIFAR-10 variant, V1 neural decoding, and TCGA miRNA pan-cancer classification, with ablations showing benefits from clustering-based masking, non-shared projections, and density-adjusted clustering. The results demonstrate that robust, generalizable unsupervised representations can be learned from non-stationary, irregular data, enabling cross-domain applicability in natural science and neuroscience. This framework lays groundwork for extending SSL to domains lacking explicit topology, potentially enabling scalable analysis of diverse high-dimensional signals.

Abstract

Unsupervised representation learning has seen tremendous progress. However, it is constrained by its reliance on domain specific stationarity and topology, a limitation not found in biological intelligence systems. For instance, unlike computer vision, human vision can process visual signals sampled from highly irregular and non-stationary sensors. We introduce a novel framework that learns from high-dimensional data without prior knowledge of stationarity and topology. Our model, abbreviated as URLOST, combines a learnable self-organizing layer, spectral clustering, and a masked autoencoder (MAE). We evaluate its effectiveness on three diverse data modalities including simulated biological vision data, neural recordings from the primary visual cortex, and gene expressions. Compared to state-of-the-art unsupervised learning methods like SimCLR and MAE, our model excels at learning meaningful representations across diverse modalities without knowing their stationarity or topology. It also outperforms other methods that are not dependent on these factors, setting a new benchmark in the field. We position this work as a step toward unsupervised learning methods capable of generalizing across diverse high-dimensional data modalities.

URLOST: Unsupervised Representation Learning without Stationarity or Topology

TL;DR

URLOST addresses unsupervised representation learning for high-dimensional signals without assuming known topology or stationarity. It fuses Density Adjusted Spectral Clustering (with , , , , and the objective ), a learnable Self-organizing Layer (with ), and a Masked Autoencoder to learn representations from masked clusters. Empirically, URLOST outperforms SimCLR and MAE across a synthetic retinal sampling CIFAR-10 variant, V1 neural decoding, and TCGA miRNA pan-cancer classification, with ablations showing benefits from clustering-based masking, non-shared projections, and density-adjusted clustering. The results demonstrate that robust, generalizable unsupervised representations can be learned from non-stationary, irregular data, enabling cross-domain applicability in natural science and neuroscience. This framework lays groundwork for extending SSL to domains lacking explicit topology, potentially enabling scalable analysis of diverse high-dimensional signals.

Abstract

Unsupervised representation learning has seen tremendous progress. However, it is constrained by its reliance on domain specific stationarity and topology, a limitation not found in biological intelligence systems. For instance, unlike computer vision, human vision can process visual signals sampled from highly irregular and non-stationary sensors. We introduce a novel framework that learns from high-dimensional data without prior knowledge of stationarity and topology. Our model, abbreviated as URLOST, combines a learnable self-organizing layer, spectral clustering, and a masked autoencoder (MAE). We evaluate its effectiveness on three diverse data modalities including simulated biological vision data, neural recordings from the primary visual cortex, and gene expressions. Compared to state-of-the-art unsupervised learning methods like SimCLR and MAE, our model excels at learning meaningful representations across diverse modalities without knowing their stationarity or topology. It also outperforms other methods that are not dependent on these factors, setting a new benchmark in the field. We position this work as a step toward unsupervised learning methods capable of generalizing across diverse high-dimensional data modalities.
Paper Structure (31 sections, 12 equations, 10 figures, 7 tables)

This paper contains 31 sections, 12 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: From left to right: the unsupervised representation learning through joint embedding and masked auto-encoding; the biological vision system that perceives via unstructured sensor and understands signal without stationarity or topology polimeni2010laminar; and many more such diverse high dimensional signal in natural science pachitariu2016suite2pxu2021mining that our method supports while most existing unsupervised methods don't.
  • Figure 2: The overview framework of URLOST. The high-dimensional input signal undergoes clustering and self-organization before unsupervised learning using a masked autoencoder for signal reconstruction.
  • Figure 3: Retina sampling (A) An image in CIFAR-10 dataset. (B) Retina sampling lattice. Each blue dot represents the center of a Gaussian kernel, which mimics a retinal ganglion cell. (C) Visualization of the car image's signal sampled using the retina lattice. Each kernel's sampled RGB value is displayed at its respective lattice location for visualization purposes. (D) density-adjusted spectral clustering results are shown. Each unique color represents a cluster, with each kernel colored according to its assigned cluster.
  • Figure 4: Learnt weights of a self-organizing layer. (A) Image is cropped into patches, where each patch $x^{(i)}$ first undergoes a different permutation $E^{(i)}$, then the inverse permutation $E^{(i)T}$. (B) The learned weight of the linear self-organizing layer. The $12$th column of $W^{(i)}$ at all positions $i$ are reshaped into patches and visualized. When $W^{(i)}$ undergoes the inverse permutation $E^{(i)T}$, they show similar patterns. (C) Visualization of the $37$th column of $W^{(i)}$. Similar to (B).
  • Figure 5: Foveated retinal sampling (A) Illustration of a Guassian kernel shown in Cheung2016retina. Diagram of single kernel filter parameterized by a mean $\mu'$ and variance $\sigma'$. (B) the location of each Gaussian kernel is summarized as a point with 2D coordinate $\mu'$. In total, the locations of 1038 Gaussian kernels are plotted. (C) The relationship between eccentricity (distance of the kernel to the center) and radius of the kernel is shown.
  • ...and 5 more figures