Table of Contents
Fetching ...

GenURL: A General Framework for Unsupervised Representation Learning

Siyuan Li, Zicheng Liu, Zelin Zang, Di Wu, Zhiyuan Chen, Stan Z. Li

TL;DR

GenURL presents a unified framework for unsupervised representation learning by simultaneously modeling global data structures (DSM) and low-dimensional embeddings (LDT) through a generalized similarity objective. It introduces static and dynamic input similarities and leverages a General Kullback-Leibler divergence to connect global structures with local transformations, enabling adaptation to DR, GE, SSL, and KD tasks. Across extensive experiments on four URL tasks, GenURL achieves state-of-the-art results and provides detailed analyses of hyperparameters and loss functions, revealing when to emphasize global topology versus local instance discrimination. The approach offers a practical, task-agnostic pathway to robust representations, with insights into the relationships between DR, GE, SSL, and KD and clear guidance for future extensions.

Abstract

Unsupervised representation learning (URL), which learns compact embeddings of high-dimensional data without supervision, has made remarkable progress recently. However, the development of URLs for different requirements is independent, which limits the generalization of the algorithms, especially prohibitive as the number of tasks grows. For example, dimension reduction methods, t-SNE, and UMAP optimize pair-wise data relationships by preserving the global geometric structure, while self-supervised learning, SimCLR, and BYOL focus on mining the local statistics of instances under specific augmentations. To address this dilemma, we summarize and propose a unified similarity-based URL framework, GenURL, which can smoothly adapt to various URL tasks. In this paper, we regard URL tasks as different implicit constraints on the data geometric structure that help to seek optimal low-dimensional representations that boil down to data structural modeling (DSM) and low-dimensional transformation (LDT). Specifically, DMS provides a structure-based submodule to describe the global structures, and LDT learns compact low-dimensional embeddings with given pretext tasks. Moreover, an objective function, General Kullback-Leibler divergence (GKL), is proposed to connect DMS and LDT naturally. Comprehensive experiments demonstrate that GenURL achieves consistent state-of-the-art performance in self-supervised visual learning, unsupervised knowledge distillation (KD), graph embeddings (GE), and dimension reduction.

GenURL: A General Framework for Unsupervised Representation Learning

TL;DR

GenURL presents a unified framework for unsupervised representation learning by simultaneously modeling global data structures (DSM) and low-dimensional embeddings (LDT) through a generalized similarity objective. It introduces static and dynamic input similarities and leverages a General Kullback-Leibler divergence to connect global structures with local transformations, enabling adaptation to DR, GE, SSL, and KD tasks. Across extensive experiments on four URL tasks, GenURL achieves state-of-the-art results and provides detailed analyses of hyperparameters and loss functions, revealing when to emphasize global topology versus local instance discrimination. The approach offers a practical, task-agnostic pathway to robust representations, with insights into the relationships between DR, GE, SSL, and KD and clear guidance for future extensions.

Abstract

Unsupervised representation learning (URL), which learns compact embeddings of high-dimensional data without supervision, has made remarkable progress recently. However, the development of URLs for different requirements is independent, which limits the generalization of the algorithms, especially prohibitive as the number of tasks grows. For example, dimension reduction methods, t-SNE, and UMAP optimize pair-wise data relationships by preserving the global geometric structure, while self-supervised learning, SimCLR, and BYOL focus on mining the local statistics of instances under specific augmentations. To address this dilemma, we summarize and propose a unified similarity-based URL framework, GenURL, which can smoothly adapt to various URL tasks. In this paper, we regard URL tasks as different implicit constraints on the data geometric structure that help to seek optimal low-dimensional representations that boil down to data structural modeling (DSM) and low-dimensional transformation (LDT). Specifically, DMS provides a structure-based submodule to describe the global structures, and LDT learns compact low-dimensional embeddings with given pretext tasks. Moreover, an objective function, General Kullback-Leibler divergence (GKL), is proposed to connect DMS and LDT naturally. Comprehensive experiments demonstrate that GenURL achieves consistent state-of-the-art performance in self-supervised visual learning, unsupervised knowledge distillation (KD), graph embeddings (GE), and dimension reduction.

Paper Structure

This paper contains 48 sections, 8 equations, 15 figures, 8 tables.

Figures (15)

  • Figure 1: Illustration of various empirical structures of high-dimensional data. We encode COIL-20 1996cOIL20, CiteSeer 1998citeseer, and STL-10 2011stl10 to 2-dim, 128-dim, and 512-dim by GenURL (128-dim and 512-dim latent spaces are then visualized by UMAP 2018UMAP in 2-dim). Left: we preserve local geometric structures of the circle manifolds in COIL-20 in the DR task. Middle: the topological and geometric structures of citation networks in CiteSeer are encoded in the GE task. Right: with the instance discriminative proxy task, we learn a discriminative representation in the validation dataset of STL-10.
  • Figure 2: Illustration of GenURL. The data structures are first modeled as similarity $P_{X}$ by calculating the graph distance on each predefined graph. Then, the low-dimensional transformation mapping $f_{\theta}$ is optimized by minimizing $\mathcal{L}$ based on the fixed $P_{X}$.
  • Figure 3: Illustration of two typical issues in unsupervised learning tasks: over-uniformity and ill-clustering.
  • Figure 4: Visualization of $p_X$ and $p_Z$ using the t-distribution. Let $d_{Z}^{X} = \kappa^{-1}(\kappa(d_{X}, \nu_{X}), \nu_{Z})$ to be the projected distance of $d_{X}$ to the latent space. When $\nu_X > \nu_Z$, there exists a fix point $d_F$ between the t-distribution with $\nu_X$ and $\nu_Z$, we have $d_{X}(i,j)-d_F < d_{Z}(i,j)-d_F$ (pull between neighbors) and $d_{X}(i,k)-d_F < d_{Z}(i,k)-d_F$ (push between disjoint samples).
  • Figure 5: Ablation of $\nu_{Z}$, $\sigma$ and batch size of GenURL for visual SSL tasks on STL-10. GenURL is pre-trained 800-epoch with ResNet-50.
  • ...and 10 more figures