Table of Contents
Fetching ...

State Representation Learning Using an Unbalanced Atlas

Li Meng, Morten Goodwin, Anis Yazidi, Paal Engelstad

TL;DR

This work addresses state representation learning in self-supervised settings by exploiting a manifold-based representation learned via an unbalanced atlas. It introduces DIM-UA, which adapts the ST-DIM framework to use dilated prediction targets and a maximal mean discrepancy-based UA loss to encourage informative chart usage. Across AtariARI and CIFAR10, DIM-UA outperforms ST-DIM and MSimCLR, with mean F1 around 0.75 when encoding dimensions are large, showing stability and scalable gains with more heads. The results indicate that an unbalanced atlas enables learning richer, scalable manifold representations, offering a practical path to improved SRL in SSL pipelines.

Abstract

The manifold hypothesis posits that high-dimensional data often lies on a lower-dimensional manifold and that utilizing this manifold as the target space yields more efficient representations. While numerous traditional manifold-based techniques exist for dimensionality reduction, their application in self-supervised learning has witnessed slow progress. The recent MSimCLR method combines manifold encoding with SimCLR but requires extremely low target encoding dimensions to outperform SimCLR, limiting its applicability. This paper introduces a novel learning paradigm using an unbalanced atlas (UA), capable of surpassing state-of-the-art self-supervised learning approaches. We investigated and engineered the DeepInfomax with an unbalanced atlas (DIM-UA) method by adapting the Spatiotemporal DeepInfomax (ST-DIM) framework to align with our proposed UA paradigm. The efficacy of DIM-UA is demonstrated through training and evaluation on the Atari Annotated RAM Interface (AtariARI) benchmark, a modified version of the Atari 2600 framework that produces annotated image samples for representation learning. The UA paradigm improves existing algorithms significantly as the number of target encoding dimensions grows. For instance, the mean F1 score averaged over categories of DIM-UA is ~75% compared to ~70% of ST-DIM when using 16384 hidden units.

State Representation Learning Using an Unbalanced Atlas

TL;DR

This work addresses state representation learning in self-supervised settings by exploiting a manifold-based representation learned via an unbalanced atlas. It introduces DIM-UA, which adapts the ST-DIM framework to use dilated prediction targets and a maximal mean discrepancy-based UA loss to encourage informative chart usage. Across AtariARI and CIFAR10, DIM-UA outperforms ST-DIM and MSimCLR, with mean F1 around 0.75 when encoding dimensions are large, showing stability and scalable gains with more heads. The results indicate that an unbalanced atlas enables learning richer, scalable manifold representations, offering a practical path to improved SRL in SSL pipelines.

Abstract

The manifold hypothesis posits that high-dimensional data often lies on a lower-dimensional manifold and that utilizing this manifold as the target space yields more efficient representations. While numerous traditional manifold-based techniques exist for dimensionality reduction, their application in self-supervised learning has witnessed slow progress. The recent MSimCLR method combines manifold encoding with SimCLR but requires extremely low target encoding dimensions to outperform SimCLR, limiting its applicability. This paper introduces a novel learning paradigm using an unbalanced atlas (UA), capable of surpassing state-of-the-art self-supervised learning approaches. We investigated and engineered the DeepInfomax with an unbalanced atlas (DIM-UA) method by adapting the Spatiotemporal DeepInfomax (ST-DIM) framework to align with our proposed UA paradigm. The efficacy of DIM-UA is demonstrated through training and evaluation on the Atari Annotated RAM Interface (AtariARI) benchmark, a modified version of the Atari 2600 framework that produces annotated image samples for representation learning. The UA paradigm improves existing algorithms significantly as the number of target encoding dimensions grows. For instance, the mean F1 score averaged over categories of DIM-UA is ~75% compared to ~70% of ST-DIM when using 16384 hidden units.
Paper Structure (17 sections, 9 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 17 sections, 9 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: The entropy of the output vector recorded epoch-wise when pretrained on the CIFAR10 dataset for a total of 1000 epochs, utilizing 8 charts and a dimensionality of 256.
  • Figure 2: A manifold $Z$ embedded in a higher dimension. Two domains are denoted by $U_\alpha$ and $U_\beta$ in $Z$. $\psi_\alpha$ and $\psi_\beta$ are the corresponding charts that map them to a lower dimensional Euclidean space. An atlas is then a collection of these charts that together cover the entire manifold.
  • Figure 3: The mean F1 and accuracy scores of 19 games when the total number of hidden units varies. The number of heads for DIM-UA is set to 4 here.
  • Figure 4: The mean F1 and accuracy scores of DIM-UA on 6 games when the number of output heads is 2, 4, or 8.
  • Figure 5: The mean F1 and accuracy scores on 6 games with different adaptations. All methods use 4 output heads.
  • ...and 1 more figures

Theorems & Definitions (2)

  • proof
  • proof