TimewarpVAE: Simultaneous Time-Warping and Representation Learning of Trajectories

Travers Rhodes; Daniel D. Lee

TimewarpVAE: Simultaneous Time-Warping and Representation Learning of Trajectories

Travers Rhodes, Daniel D. Lee

TL;DR

TimewarpVAE tackles the problem of representing variable-speed trajectories by disentangling timing from spatial structure using a differentiable DTW-based time warp. It extends beta-VAE with a time encoder and a time-warping module, introducing a regularization term to avoid degenerate warps. It achieves lower aligned spatial reconstruction error than baselines on fork and handwriting datasets and can generate semantically meaningful novel trajectories, including faster ones for robotic execution. This work advances trajectory representation learning with practical implications for rapid and energy-efficient robot motion.

Abstract

Human demonstrations of trajectories are an important source of training data for many machine learning problems. However, the difficulty of collecting human demonstration data for complex tasks makes learning efficient representations of those trajectories challenging. For many problems, such as for dexterous manipulation, the exact timings of the trajectories should be factored from their spatial path characteristics. In this work, we propose TimewarpVAE, a fully differentiable manifold-learning algorithm that incorporates Dynamic Time Warping (DTW) to simultaneously learn both timing variations and latent factors of spatial variation. We show how the TimewarpVAE algorithm learns appropriate time alignments and meaningful representations of spatial variations in handwriting and fork manipulation datasets. Our results have lower spatial reconstruction test error than baseline approaches and the learned low-dimensional representations can be used to efficiently generate semantically meaningful novel trajectories. We demonstrate the utility of our algorithm to generate novel high-speed trajectories for a robotic arm.

TimewarpVAE: Simultaneous Time-Warping and Representation Learning of Trajectories

TL;DR

Abstract

Paper Structure (42 sections, 36 equations, 15 figures, 2 tables)

This paper contains 42 sections, 36 equations, 15 figures, 2 tables.

Introduction
Related Work
Technical Approach
Beta-VAE
TimewarpVAE
Neural Network Formulation
Architecture for the time-warper.
Neural network architecture for the temporal encoder.
Neural network architecture for the spatial encoder.
Neural network architecture for the decoder
Regularization of Time-Warping Function
Experiments
Fork Trajectory Dataset
Handwriting Gestures Dataset
Model Performance Measures
...and 27 more sections

Figures (15)

Figure 1: TimewarpVAE learns a low-dimensional latent representation of complex trajectories that explicitly factorizes timing and spatial styles. The Kinova Gen3 robot arm is able to draw various versions of the letter "A" more quickly by speeding up or slowing down different parts of the trajectory to obey dynamical mechanical constraints. The resulting end-effector path is overlaid on the images. A video and more details are provided in the Supplemental Materials.
Figure 2: Interpolations in latent space between canonical trajectories using various models. For Rate Invariant Autoencoder and TimewarpVAE, we use a sixteen dimensional spatial latent space and the interpolation is constructed by decoding the average of the spatial latent embeddings. The resulting average trajectory is plotted alongside the reconstructions of the original two trajectories. The Rate Invariant Autoencoder can ignore parts of the canonical trajectory during training, leading to the jittering seen at the beginning and end of the canonical trajectory.
Figure 3: The architecture for Beta-VAE. Beta-VAE takes in a trajectory $x$, encodes it into a latent distribution parameterized by $z$ and $\log(\sigma^2)$, and decodes to a trajectory $\tilde{x}$.
Figure 4: The architecture for TimewarpVAE. TimewarpVAE takes in a full trajectory $z$ and a timestamp $t$ and reconstructs the position $p$ of the trajectory at that timestamp. TimewarpVAE separately encodes the timing of the trajectory into $\Theta$ and encodes the spatial information into a latent distribution parameterized by $z$ and $\log(\sigma^2)$.
Figure 5: We collect trajectory recordings of the position and orientation of a fork while it is used to pick a small piece of yarn off a plate with steep sides. Example trajectories are presented from two angles, showing the initial orientation of the fork and the position of the tip of the fork over time.
...and 10 more figures

TimewarpVAE: Simultaneous Time-Warping and Representation Learning of Trajectories

TL;DR

Abstract

TimewarpVAE: Simultaneous Time-Warping and Representation Learning of Trajectories

Authors

TL;DR

Abstract

Table of Contents

Figures (15)