Probabilistic World Modeling with Asymmetric Distance Measure

Meng Song

Probabilistic World Modeling with Asymmetric Distance Measure

Meng Song

TL;DR

It is shown that a geometric abstraction of the probabilistic world dynamics can be embedded into the representation space through asymmetric contrastive learning, and an asymmetric similarity function that reflects the state reachability and allows multi-way probabilistic inference is learned.

Abstract

Representation learning is a fundamental task in machine learning, aiming at uncovering structures from data to facilitate subsequent tasks. However, what is a good representation for planning and reasoning in a stochastic world remains an open problem. In this work, we posit that learning a distance function is essential to allow planning and reasoning in the representation space. We show that a geometric abstraction of the probabilistic world dynamics can be embedded into the representation space through asymmetric contrastive learning. Unlike previous approaches that focus on learning mutual similarity or compatibility measures, we instead learn an asymmetric similarity function that reflects the state reachability and allows multi-way probabilistic inference. Moreover, by conditioning on a common reference state (e.g. the observer's current state), the learned representation space allows us to discover the geometrically salient states that only a handful of paths can lead through. These states can naturally serve as subgoals to break down long-horizon planning tasks. We evaluate our method in gridworld environments with various layouts and demonstrate its effectiveness in discovering the subgoals.

Probabilistic World Modeling with Asymmetric Distance Measure

TL;DR

Abstract

Paper Structure (22 sections, 24 equations, 6 figures)

This paper contains 22 sections, 24 equations, 6 figures.

Introduction
Preliminaries
Markov chain and the directed transition graph
MDP and the environment graph
Problem formulation
Vertex reachability
C-step approximation
Asymmetric contrastive representation learning
Asymmetric encoders
The choice of the negative distribution
Reference state conditioned distance measure
Subgoal discovery
Experiments
t-SNE visualization of the learned representations
Subgoal discovery results
...and 7 more sections

Figures (6)

Figure 1: Gridworld environments: Grey areas indicate the walls. The yellow star indicates the initial state ${\mathbf{s}}_0$. When an agent collides with walls or attempts to move beyond the boundaries of the environment, its movement is blocked, and the agent remains in its current position.
Figure 2: Visualization of the original states and the learned representations. The states are colored to visualize their position correspondences between two spaces. In each environment, we visualize the representation space from two different perspectives. The reference states ${\mathbf{r}}$ are indicated by the red stars. In each group, the left plot shows the original states in the 2D Euclidean space and the right plot shows the t-SNE projection of the learned representations. In all the experiments, we set approximation step size $C=16$, and train the encoders on a single episode of length $T=153600$.
Figure 3: Subgoal discovery results. The states are colored according to the cluster labels in both the original space and the learned representation space. The gray states are subgoals. In each environment, we visualize the clustering results from two different perspectives. The reference states ${\mathbf{r}}$ are indicated by the red stars. In each group, the left plot shows the original states in the 2D Euclidean space and the right plot shows the t-SNE projection of the learned representations. In all the experiments, we set approximation step size $C=16$, and train the encoders on a single episode of length $T=153600$.
Figure 4: Visualization of the original states and the learned representations with different approximation step sizes $C$ in Four Rooms environment. The embeddings are projected to 2D plots by t-SNE. In all the experiments, we train the encoders on a single episode of length $T=153600$.
Figure 5: Learned representations with different negative distributions in Four Rooms environment when $C=16$. Left column: $P_n(X)=P_X(X)$, Middle column: $P_n(X)=P_Y(X)$, Right column: $P_n(X)=U(X)$. Each row corresponds to the results with a different episode length.
...and 1 more figures

Probabilistic World Modeling with Asymmetric Distance Measure

TL;DR

Abstract

Probabilistic World Modeling with Asymmetric Distance Measure

Authors

TL;DR

Abstract

Table of Contents

Figures (6)