Table of Contents
Fetching ...

Homomorphism Autoencoder -- Learning Group Structured Representations from Observed Transitions

Hamza Keurti, Hsiao-Ru Pan, Michel Besserve, Benjamin F. Grewe, Bernhard Schölkopf

TL;DR

The paper introduces the Homomorphism Autoencoder (HAE), a self-supervised architecture that learns group-structured latent representations from observed state-action transitions without prior knowledge of the underlying group. By enforcing equivariance between environment transformations and latent encodings through a learned ρ = \exp ∘ φ, and using a two-step latent prediction plus reconstruction objective, HAEs recover symmetry-based representations and disentangle latent factors aligned with group decompositions. The framework is validated across multiple experiments, demonstrating toroidal latent structures, disentangled subspaces, Lie-algebra-based latent traversals, and informative long-horizon rollouts for groups such as SO(2)×SO(2)×SO(2) and SO(3). This approach advances interventional world modeling by linking actions to latent dynamics in a group-theoretic, unsupervised manner, with potential implications for neuroscience-inspired representation learning and robust model-based planning.

Abstract

How can agents learn internal models that veridically represent interactions with the real world is a largely open question. As machine learning is moving towards representations containing not just observational but also interventional knowledge, we study this problem using tools from representation learning and group theory. We propose methods enabling an agent acting upon the world to learn internal representations of sensory information that are consistent with actions that modify it. We use an autoencoder equipped with a group representation acting on its latent space, trained using an equivariance-derived loss in order to enforce a suitable homomorphism property on the group representation. In contrast to existing work, our approach does not require prior knowledge of the group and does not restrict the set of actions the agent can perform. We motivate our method theoretically, and show empirically that it can learn a group representation of the actions, thereby capturing the structure of the set of transformations applied to the environment. We further show that this allows agents to predict the effect of sequences of future actions with improved accuracy.

Homomorphism Autoencoder -- Learning Group Structured Representations from Observed Transitions

TL;DR

The paper introduces the Homomorphism Autoencoder (HAE), a self-supervised architecture that learns group-structured latent representations from observed state-action transitions without prior knowledge of the underlying group. By enforcing equivariance between environment transformations and latent encodings through a learned ρ = \exp ∘ φ, and using a two-step latent prediction plus reconstruction objective, HAEs recover symmetry-based representations and disentangle latent factors aligned with group decompositions. The framework is validated across multiple experiments, demonstrating toroidal latent structures, disentangled subspaces, Lie-algebra-based latent traversals, and informative long-horizon rollouts for groups such as SO(2)×SO(2)×SO(2) and SO(3). This approach advances interventional world modeling by linking actions to latent dynamics in a group-theoretic, unsupervised manner, with potential implications for neuroscience-inspired representation learning and robust model-based planning.

Abstract

How can agents learn internal models that veridically represent interactions with the real world is a largely open question. As machine learning is moving towards representations containing not just observational but also interventional knowledge, we study this problem using tools from representation learning and group theory. We propose methods enabling an agent acting upon the world to learn internal representations of sensory information that are consistent with actions that modify it. We use an autoencoder equipped with a group representation acting on its latent space, trained using an equivariance-derived loss in order to enforce a suitable homomorphism property on the group representation. In contrast to existing work, our approach does not require prior knowledge of the group and does not restrict the set of actions the agent can perform. We motivate our method theoretically, and show empirically that it can learn a group representation of the actions, thereby capturing the structure of the set of transformations applied to the environment. We further show that this allows agents to predict the effect of sequences of future actions with improved accuracy.
Paper Structure (55 sections, 5 theorems, 48 equations, 18 figures, 5 tables)

This paper contains 55 sections, 5 theorems, 48 equations, 18 figures, 5 tables.

Key Result

Proposition \ref{prop:twosteps}

Under generative model of Section sec:sbdrl with $b$ diffeomorphic onto its image and Assumption assum:linearworld, consider a setting where sample paths have a strictly positive density on a $G$-invariant support. If $(\rho,h,d)$ are continuous and minimize the expectation of $\mathcal{L}_{pred}^2(

Figures (18)

  • Figure 1: Left: Commutative diagram of a group-structured representation. Right: Toroidal latent world for a moving 2D object observation, parameterized by the coordinates of the objects' center (red). Dashed lines describe the unwrapping process.
  • Figure 2: The Homomorphism Autoencoder (HAE) consisting of $h$ (encoder), $d$ (decoder) and $\rho=\exp \circ \, \phi$ (group representation) relies on 2-step latent prediction to jointly learn the group representation $\rho$ and the observation representation $h$. The HAE learns by jointly minimizing both the latent prediction loss and the reconstruction loss (dotted connections) to simultaneously learn representations of the observations and the group actions.
  • Figure 3: Top: Projection of the HAEs' $8-$dimensional latent representation vectors $z$ of the translated heart dataset, exhibiting the 2D toroidal structure of the world's state. Color indicates the heart's $x$ position, while markers indicate $y$ position. Bottom: Evaluation of the learned and disentangled group representation $\rho$ for actions over a grid centered on the identity element. Arrows indicate the direction of actions $g$. The representation trivializes the subspace spanned by the indices $1,3,4,5$, $C_x$ and $C_y$ act respectively on dimensions $[2,3]$ and $[7,8]$ through rotation matrices.
  • Figure 4: We visualize the linear traversal of the group algebra for the dSprites experiment and its effect on the predicted image reconstruction. The first row corresponds to a traversal $tA_1$ (horizontal displacement), while the second row corresponds to the traversal $tA_2$ (vertical displacement).
  • Figure 5: Step-wise reconstruction loss on the test dataset. Lines and shadings represent median and interquartile range over 50 random seeds.
  • ...and 13 more figures

Theorems & Definitions (17)

  • Proposition \ref{prop:twosteps}: informal
  • Definition 1.1: Group
  • Definition 1.2: Group Action
  • Definition 1.3: Group Representation
  • Definition 1.4: Lie Group
  • Definition 1.5: Exponential Map
  • Definition 1.6: Transitive Group Action
  • Definition 1.7: Faithful Group Action
  • Definition 1.8: Orbit by a Group Action
  • Proposition \ref{prop:twosteps}
  • ...and 7 more