Table of Contents
Fetching ...

WalkTheDog: Cross-Morphology Motion Alignment via Phase Manifolds

Peizhuo Li, Sebastian Starke, Yuting Ye, Olga Sorkine-Hornung

TL;DR

The proposed vector quantized periodic autoencoder learns a shared phase manifold for multiple characters, such as a human and a dog, without any supervision, and its capability of timing and semantics alignment in several applications, including motion retrieval, transfer and stylization is demonstrated.

Abstract

We present a new approach for understanding the periodicity structure and semantics of motion datasets, independently of the morphology and skeletal structure of characters. Unlike existing methods using an overly sparse high-dimensional latent, we propose a phase manifold consisting of multiple closed curves, each corresponding to a latent amplitude. With our proposed vector quantized periodic autoencoder, we learn a shared phase manifold for multiple characters, such as a human and a dog, without any supervision. This is achieved by exploiting the discrete structure and a shallow network as bottlenecks, such that semantically similar motions are clustered into the same curve of the manifold, and the motions within the same component are aligned temporally by the phase variable. In combination with an improved motion matching framework, we demonstrate the manifold's capability of timing and semantics alignment in several applications, including motion retrieval, transfer and stylization. Code and pre-trained models for this paper are available at https://peizhuoli.github.io/walkthedog.

WalkTheDog: Cross-Morphology Motion Alignment via Phase Manifolds

TL;DR

The proposed vector quantized periodic autoencoder learns a shared phase manifold for multiple characters, such as a human and a dog, without any supervision, and its capability of timing and semantics alignment in several applications, including motion retrieval, transfer and stylization is demonstrated.

Abstract

We present a new approach for understanding the periodicity structure and semantics of motion datasets, independently of the morphology and skeletal structure of characters. Unlike existing methods using an overly sparse high-dimensional latent, we propose a phase manifold consisting of multiple closed curves, each corresponding to a latent amplitude. With our proposed vector quantized periodic autoencoder, we learn a shared phase manifold for multiple characters, such as a human and a dog, without any supervision. This is achieved by exploiting the discrete structure and a shallow network as bottlenecks, such that semantically similar motions are clustered into the same curve of the manifold, and the motions within the same component are aligned temporally by the phase variable. In combination with an improved motion matching framework, we demonstrate the manifold's capability of timing and semantics alignment in several applications, including motion retrieval, transfer and stylization. Code and pre-trained models for this paper are available at https://peizhuoli.github.io/walkthedog.
Paper Structure (26 sections, 9 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 26 sections, 9 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Architecture of VQ-PAE. Starting with a short motion sequence $\textbf{X} \in \mathds{R} ^{J \times T}$, the encoder learns an intermediate representation using convolution. The representation is fed into the timing and the amplitude branch for predicting the phase $\phi$, the frequency $f$ and the amplitude $\textbf{A}$ of the pivot frame (rendered with mesh). A vector quantization (i.e. nearest neighbor search) is used in the amplitude branch to ensure the structure of the phase manifold. Note the codebook $\mathcal{A}$ is shared among multiple VQ-PAEs. We calculate the embedding $\textbf{P}$ of the sequence assuming the frequency and amplitude stay constant in the sequence. The predicted phase manifold sequence is then passed through a convolutional decoder to reconstruct the input motion. Components with learnable parameters are marked in blue.
  • Figure 2: Details of phase calculation module.
  • Figure 3: Overview of training multiple VQ-PAEs on heterogeneous datasets. A common phase manifold is guaranteed by using a shared codebook $\mathcal{A}$.
  • Figure 4: The running motions in Dog and Human-Loco dataset are of different frequencies. With frequency scaling, the motion with correct semantics is matched.
  • Figure 5: Motion retrieval. We retrieve motions at different frequencies in the same connected component containing motions of a dog moving up and down. From left to right the frequency decreases, corresponding to fast jumping, jumping up and sitting back, and slowly standing up and sitting back. Please refer to 1:17 in the accompanying video for a more comprehensive result.
  • ...and 3 more figures