Table of Contents
Fetching ...

MoReFlow: Motion Retargeting Learning through Unsupervised Flow Matching

Wontaek Kim, Tianyu Li, Sehoon Ha

TL;DR

MoReFlow addresses cross-morphology motion retargeting without paired data by combining a per-character VQ-VAE motion tokenizer with a conditional flow-matching model operating in tokenized latent spaces. It enables controllable, reversible retargeting across diverse morphologies by learning motion correspondences and allowing different alignment objectives, such as local style preservation or world-space task alignment. The approach uses classifier-free guidance and multi-sample condition coupling to train both unconditional and conditional flows, yielding high-quality, realistic motions with improved diversity and semantic alignment compared to baselines. The results demonstrate strong generalization across humanoid and quadruped robots and offer a scalable framework for cross-character animation and robotics tasks.

Abstract

Motion retargeting holds a premise of offering a larger set of motion data for characters and robots with different morphologies. Many prior works have approached this problem via either handcrafted constraints or paired motion datasets, limiting their applicability to humanoid characters or narrow behaviors such as locomotion. Moreover, they often assume a fixed notion of retargeting, overlooking domain-specific objectives like style preservation in animation or task-space alignment in robotics. In this work, we propose MoReFlow, Motion Retargeting via Flow Matching, an unsupervised framework that learns correspondences between characters' motion embedding spaces. Our method consists of two stages. First, we train tokenized motion embeddings for each character using a VQ-VAE, yielding compact latent representations. Then, we employ flow matching with conditional coupling to align the latent spaces across characters, which simultaneously learns conditioned and unconditioned matching to achieve robust but flexible retargeting. Once trained, MoReFlow enables flexible and reversible retargeting without requiring paired data. Experiments demonstrate that MoReFlow produces high-quality motions across diverse characters and tasks, offering improved controllability, generalization, and motion realism compared to the baselines.

MoReFlow: Motion Retargeting Learning through Unsupervised Flow Matching

TL;DR

MoReFlow addresses cross-morphology motion retargeting without paired data by combining a per-character VQ-VAE motion tokenizer with a conditional flow-matching model operating in tokenized latent spaces. It enables controllable, reversible retargeting across diverse morphologies by learning motion correspondences and allowing different alignment objectives, such as local style preservation or world-space task alignment. The approach uses classifier-free guidance and multi-sample condition coupling to train both unconditional and conditional flows, yielding high-quality, realistic motions with improved diversity and semantic alignment compared to baselines. The results demonstrate strong generalization across humanoid and quadruped robots and offer a scalable framework for cross-character animation and robotics tasks.

Abstract

Motion retargeting holds a premise of offering a larger set of motion data for characters and robots with different morphologies. Many prior works have approached this problem via either handcrafted constraints or paired motion datasets, limiting their applicability to humanoid characters or narrow behaviors such as locomotion. Moreover, they often assume a fixed notion of retargeting, overlooking domain-specific objectives like style preservation in animation or task-space alignment in robotics. In this work, we propose MoReFlow, Motion Retargeting via Flow Matching, an unsupervised framework that learns correspondences between characters' motion embedding spaces. Our method consists of two stages. First, we train tokenized motion embeddings for each character using a VQ-VAE, yielding compact latent representations. Then, we employ flow matching with conditional coupling to align the latent spaces across characters, which simultaneously learns conditioned and unconditioned matching to achieve robust but flexible retargeting. Once trained, MoReFlow enables flexible and reversible retargeting without requiring paired data. Experiments demonstrate that MoReFlow produces high-quality motions across diverse characters and tasks, offering improved controllability, generalization, and motion realism compared to the baselines.

Paper Structure

This paper contains 34 sections, 19 equations, 2 figures, 3 tables, 1 algorithm.

Figures (2)

  • Figure 1: MoReFlow enables diverse source motions to be retargeted across different target characters with controllable outcomes. The source motions include both dynamic whole-body movements and fine-grained manipulation tasks. The target characters range from smaller humanoid robots to morphologically distinct platforms such as Spot. Our framework can generate multiple retargeted variations to accommodate different user preferences.
  • Figure 2: Overview of the proposed MoReFlow framework. Each character (C$^{\text{src}}$ and C$^{\text{tgt}}$) has a pretrained VQ-VAE tokenizer consisting of an encoder, a decoder, and a codebook. A source motion is first encoded and quantized into tokens from the source codebook. The flow matching model then maps the token distribution from the source codebook to the target codebook, optionally conditioned on task requirements, such as local style alignment or world-frame alignment. The retargeted motion is reconstructed using the pretrained target decoder.