Table of Contents
Fetching ...

On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression

Zichang Ge, Changyu Chen, Arunesh Sinha, Pradeep Varakantham

TL;DR

This work addresses the challenge of learning informative trajectory embeddings from state-action sequences without reward labels. It introduces a two-stage framework: first extracting latent skills via LOVE-based compression to obtain $\mathbf{z}_{1:T}$ and $\mathbf{m}_{1:T}$, then applying a variational trajectory encoder (VTE) that, through a transformer, yields a latent ability vector $\mathbf{e}$ usable across imitation, classification, clustering, and regression. The method demonstrates strong downstream performance, reveals a disentangled and controllable embedding structure, and shows robust task generalization across diverse environments. The approach enables conditional imitation with ability-conditioned policies, high-accuracy trajectory classification, and reliable return prediction, underscoring its practical impact for cross-domain sequential decision-making tasks without relying on reward signals.

Abstract

In real-world sequential decision making tasks like autonomous driving, robotics, and healthcare, learning from observed state-action trajectories is critical for tasks like imitation, classification, and clustering. For example, self-driving cars must replicate human driving behaviors, while robots and healthcare systems benefit from modeling decision sequences, whether or not they come from expert data. Existing trajectory encoding methods often focus on specific tasks or rely on reward signals, limiting their ability to generalize across domains and tasks. Inspired by the success of embedding models like CLIP and BERT in static domains, we propose a novel method for embedding state-action trajectories into a latent space that captures the skills and competencies in the dynamic underlying decision-making processes. This method operates without the need for reward labels, enabling better generalization across diverse domains and tasks. Our contributions are threefold: (1) We introduce a trajectory embedding approach that captures multiple abilities from state-action data. (2) The learned embeddings exhibit strong representational power across downstream tasks, including imitation, classification, clustering, and regression. (3) The embeddings demonstrate unique properties, such as controlling agent behaviors in IQ-Learn and an additive structure in the latent space. Experimental results confirm that our method outperforms traditional approaches, offering more flexible and powerful trajectory representations for various applications. Our code is available at https://github.com/Erasmo1015/vte.

On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression

TL;DR

This work addresses the challenge of learning informative trajectory embeddings from state-action sequences without reward labels. It introduces a two-stage framework: first extracting latent skills via LOVE-based compression to obtain and , then applying a variational trajectory encoder (VTE) that, through a transformer, yields a latent ability vector usable across imitation, classification, clustering, and regression. The method demonstrates strong downstream performance, reveals a disentangled and controllable embedding structure, and shows robust task generalization across diverse environments. The approach enables conditional imitation with ability-conditioned policies, high-accuracy trajectory classification, and reliable return prediction, underscoring its practical impact for cross-domain sequential decision-making tasks without relying on reward signals.

Abstract

In real-world sequential decision making tasks like autonomous driving, robotics, and healthcare, learning from observed state-action trajectories is critical for tasks like imitation, classification, and clustering. For example, self-driving cars must replicate human driving behaviors, while robots and healthcare systems benefit from modeling decision sequences, whether or not they come from expert data. Existing trajectory encoding methods often focus on specific tasks or rely on reward signals, limiting their ability to generalize across domains and tasks. Inspired by the success of embedding models like CLIP and BERT in static domains, we propose a novel method for embedding state-action trajectories into a latent space that captures the skills and competencies in the dynamic underlying decision-making processes. This method operates without the need for reward labels, enabling better generalization across diverse domains and tasks. Our contributions are threefold: (1) We introduce a trajectory embedding approach that captures multiple abilities from state-action data. (2) The learned embeddings exhibit strong representational power across downstream tasks, including imitation, classification, clustering, and regression. (3) The embeddings demonstrate unique properties, such as controlling agent behaviors in IQ-Learn and an additive structure in the latent space. Experimental results confirm that our method outperforms traditional approaches, offering more flexible and powerful trajectory representations for various applications. Our code is available at https://github.com/Erasmo1015/vte.
Paper Structure (30 sections, 1 theorem, 10 equations, 8 figures, 10 tables, 1 algorithm)

This paper contains 30 sections, 1 theorem, 10 equations, 8 figures, 10 tables, 1 algorithm.

Key Result

Proposition 1

For any given environment, ${\arg\max}_\theta \log p_\theta(\tau | {\bm{e}}) = {\arg\max}_\theta \sum_t \log p_\theta({\bm{a}}_t|{\bm{x}}_t, {\bm{e}})$.

Figures (8)

  • Figure 1: Illustration of VTE Framework. For the encoder, by exploiting the pretrained SE-Logit from \ref{['sec:love']}, we extract the skill variable ${\bm{z}}_{1:T}$ and the boundary variable ${\bm{m}}_{1:T}$. These are then passed through separate MLPs, mapping ${\bm{z}}_{1:T}$ and ${\bm{m}}_{1:T}$ to ${\bm{e}}^z_{1:T}$ and ${\bm{e}}^m_{1:T}$, respectively, which are of equal size. At each time step, we concatenate these embeddings, resulting in ${\bm{e}}^{z,m}_{1:T}$, where ${\bm{e}}^{z,m}_{i}=\textrm{Concat}({\bm{e}}^z_i, {\bm{e}}^m_i)$. ${\bm{e}}^{z,m}_{1:T}$ is then fed into a transformer to compute the posterior $q_\phi({\bm{e}}|\tau)$. For the decoder, the action ${\bm{a}}_t$ is predicted from the state ${\bm{x}}_t$, conditioned on the trajectory embedding.
  • Figure 2: tSNE Clustering Analysis
  • Figure 3: Evaluation curve of returns on Hopper for different ability levels.
  • Figure 4: Overall visual comparison of change in behavior in Walker2D environment, presented in the left column, and Hopper environment presented in the right column, when a dimension of the trajectory embedding is changed. Each of the four boxes is a perturbation result on one of the 10 vector dimensions of the trajectory embedding. Inside each box there are three different rows showing a sequence of frames. First Row: The value of the dimension is decreased. Second Row: The value is not perturbed as a control group. Third Row: The value of the dimension is increased (see more description in text).
  • Figure 5: Heatmap of Wasserstein distance (see definition in text) between distribution of trajectory embeddings for different ability levels.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Proposition 1