Table of Contents
Fetching ...

Learning The Minimum Action Distance

Lorenzo Steccanella, Joshua B. Evans, Özgür Şimşek, Anders Jonsson

TL;DR

This work addresses learning a state-space distance that equals the Minimum Action Distance $d_{ ext{MAD}}(s,s')$, the minimum number of actions needed to reach $s'$ from $s$, without reward signals or action data. It introduces self-supervised, trajectory-based training with state embeddings and a quasimetric to capture potentially asymmetric distances, via two methods: MadDist (direct distance learning) and TDMadDist (temporal-difference bootstrapping). The authors prove that MAD corresponds to a shortest-path distance on the one-step reachability graph and demonstrate the approach across diverse environments with ground-truth MAD, showing that MadDist, particularly with a simple quasimetric, yields high-fidelity MAD embeddings and strong planning performance. This enables robust goal-conditioned planning and transfer in reward-free settings, with practical impact for planning and representation learning in complex, possibly asymmetric MDPs.

Abstract

This paper presents a state representation framework for Markov decision processes (MDPs) that can be learned solely from state trajectories, requiring neither reward signals nor the actions executed by the agent. We propose learning the minimum action distance (MAD), defined as the minimum number of actions required to transition between states, as a fundamental metric that captures the underlying structure of an environment. MAD naturally enables critical downstream tasks such as goal-conditioned reinforcement learning and reward shaping by providing a dense, geometrically meaningful measure of progress. Our self-supervised learning approach constructs an embedding space where the distances between embedded state pairs correspond to their MAD, accommodating both symmetric and asymmetric approximations. We evaluate the framework on a comprehensive suite of environments with known MAD values, encompassing both deterministic and stochastic dynamics, as well as discrete and continuous state spaces, and environments with noisy observations. Empirical results demonstrate that the proposed approach not only efficiently learns accurate MAD representations across these diverse settings but also significantly outperforms existing state representation methods in terms of representation quality.

Learning The Minimum Action Distance

TL;DR

This work addresses learning a state-space distance that equals the Minimum Action Distance , the minimum number of actions needed to reach from , without reward signals or action data. It introduces self-supervised, trajectory-based training with state embeddings and a quasimetric to capture potentially asymmetric distances, via two methods: MadDist (direct distance learning) and TDMadDist (temporal-difference bootstrapping). The authors prove that MAD corresponds to a shortest-path distance on the one-step reachability graph and demonstrate the approach across diverse environments with ground-truth MAD, showing that MadDist, particularly with a simple quasimetric, yields high-fidelity MAD embeddings and strong planning performance. This enables robust goal-conditioned planning and transfer in reward-free settings, with practical impact for planning and representation learning in complex, possibly asymmetric MDPs.

Abstract

This paper presents a state representation framework for Markov decision processes (MDPs) that can be learned solely from state trajectories, requiring neither reward signals nor the actions executed by the agent. We propose learning the minimum action distance (MAD), defined as the minimum number of actions required to transition between states, as a fundamental metric that captures the underlying structure of an environment. MAD naturally enables critical downstream tasks such as goal-conditioned reinforcement learning and reward shaping by providing a dense, geometrically meaningful measure of progress. Our self-supervised learning approach constructs an embedding space where the distances between embedded state pairs correspond to their MAD, accommodating both symmetric and asymmetric approximations. We evaluate the framework on a comprehensive suite of environments with known MAD values, encompassing both deterministic and stochastic dynamics, as well as discrete and continuous state spaces, and environments with noisy observations. Empirical results demonstrate that the proposed approach not only efficiently learns accurate MAD representations across these diverse settings but also significantly outperforms existing state representation methods in terms of representation quality.

Paper Structure

This paper contains 39 sections, 4 theorems, 33 equations, 9 figures, 5 tables.

Key Result

Theorem 1

The Minimum Action Distance, $d_{\text{MAD}}$, as defined above, is the unique solution to the constrained optimization problem:

Figures (9)

  • Figure 1: Schematic overview of MAD representation learning. From left to right: (1) the hidden environment graph, (2) trajectories collected by an unknown policy, (3) the embedding function $\phi:S \rightarrow \mathbb{R}^2$ and (4) the resulting MAD embedding space in $\mathbb{R}^2$.
  • Figure 2: A subset of the environments used in our analysis.
  • Figure 3: Pearson correlation coefficients and coefficient of variation (CV) ratios across a selection of test environments. Shaded regions minimum and maximum values across three random seeds.
  • Figure 4: Impact of latent size on Spearman correlation, Pearson correlation and Ratio CV of the MadDist and TDMadDist algorithms, evaluated in the CliffWalking environment. Shaded regions show the range of values across five random seeds, with upper and lower boundaries representing maximum and minimum values.
  • Figure 5: Impact of different quasimetric functions on correlation and Ratio CV of the MadDist algorithm, evaluated in the CliffWalking environment. Shaded regions show the range of values across five random seeds, with upper and lower boundaries representing maximum and minimum values.
  • ...and 4 more figures

Theorems & Definitions (12)

  • Definition 1: Minimum Action Distance
  • Theorem 1
  • proof
  • Definition 2: ReLU Reduction
  • Proposition 1
  • proof
  • Definition 3: Max Reduction
  • Definition 4: Sum and Mean Reductions
  • Proposition 2
  • proof
  • ...and 2 more