Table of Contents
Fetching ...

MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning

Xiaoyang Liu, Yunyao Mao, Wengang Zhou, Houqiang Li

TL;DR

This work introduces MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning for optimizing text-to-motion generation tasks and aligning them with human preferences, and introduces a novel multi-objective optimization strategy to approximate Pareto optimality between text adherence, motion quality, and human preferences.

Abstract

We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks and aligning them with human preferences. Previous works focused on improving numerical performance metrics on the given datasets, often neglecting the variability and subjectivity of human feedback. In contrast, our novel approach uses reinforcement learning to fine-tune the motion generator based on human preferences prior knowledge of the human perception model, allowing it to generate motions that better align human preferences. In addition, MotionRL introduces a novel multi-objective optimization strategy to approximate Pareto optimality between text adherence, motion quality, and human preferences. Extensive experiments and user studies demonstrate that MotionRL not only allows control over the generated results across different objectives but also significantly enhances performance across these metrics compared to other algorithms.

MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning

TL;DR

This work introduces MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning for optimizing text-to-motion generation tasks and aligning them with human preferences, and introduces a novel multi-objective optimization strategy to approximate Pareto optimality between text adherence, motion quality, and human preferences.

Abstract

We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks and aligning them with human preferences. Previous works focused on improving numerical performance metrics on the given datasets, often neglecting the variability and subjectivity of human feedback. In contrast, our novel approach uses reinforcement learning to fine-tune the motion generator based on human preferences prior knowledge of the human perception model, allowing it to generate motions that better align human preferences. In addition, MotionRL introduces a novel multi-objective optimization strategy to approximate Pareto optimality between text adherence, motion quality, and human preferences. Extensive experiments and user studies demonstrate that MotionRL not only allows control over the generated results across different objectives but also significantly enhances performance across these metrics compared to other algorithms.
Paper Structure (24 sections, 12 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 24 sections, 12 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Examples generated by MoMask and Ours. Our method significantly outperforms the previous state-of-the-art MoMask in text adherence, motion quality and human preferences.
  • Figure 2: The overall pipeline of MotionRL. Given a text input, the Transformer serves as a motion generator, first producing multiple motions as a batch. Various rewards are then computed for these motions. Within this batch of motions, the Pareto set is identified. Finally, using the rewards from the Pareto set, along with the outputs of the critic model and the prediction logits, the motion generator is optimized using the PPO algorithm (note that the critic model is omitted in the diagram).
  • Figure 3: Human Preferences Evaluation. (a) Perceptual scores on the test set using the pretrained perception model from motioncritic. The results show that our method aligns more closely with human perception compared to other approaches. (b) Comparison of human evaluations between our method and others. The results demonstrate that our method generates motions that are more consistent with human preferences.
  • Figure 4: Qualitative comparisons with top-performing methods. Our MotionRL exhibits better motion generation quality.
  • Figure 5: Impact of Pareto Selection and Reward-Specific Tokens. It illustrates the effectiveness of our proposed Pareto selection in enhancing the model's overall reward value. It also demonstrates how using different reward-specific tokens allows for trade-offs between various optimization goals, improving the balance between motion quality, text adherence, and human preferences in the generated outputs.
  • ...and 2 more figures