MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning

Xiaoyang Liu; Yunyao Mao; Wengang Zhou; Houqiang Li

MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning

Xiaoyang Liu, Yunyao Mao, Wengang Zhou, Houqiang Li

TL;DR

This work introduces MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning for optimizing text-to-motion generation tasks and aligning them with human preferences, and introduces a novel multi-objective optimization strategy to approximate Pareto optimality between text adherence, motion quality, and human preferences.

Abstract

We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks and aligning them with human preferences. Previous works focused on improving numerical performance metrics on the given datasets, often neglecting the variability and subjectivity of human feedback. In contrast, our novel approach uses reinforcement learning to fine-tune the motion generator based on human preferences prior knowledge of the human perception model, allowing it to generate motions that better align human preferences. In addition, MotionRL introduces a novel multi-objective optimization strategy to approximate Pareto optimality between text adherence, motion quality, and human preferences. Extensive experiments and user studies demonstrate that MotionRL not only allows control over the generated results across different objectives but also significantly enhances performance across these metrics compared to other algorithms.

MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning

TL;DR

Abstract

Paper Structure (24 sections, 12 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 24 sections, 12 equations, 7 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Human Motion Generation
Human Feedbacks of Generated Motion
Reinforcement Learning with Human Feedbacks
Preliminary
Autoregressive Model for Motion Generation
Reward-specific Token Design and Sampling
Methodology
Multi-Reward Design
Batch-wise Pareto-optimal Selection
Pareto-based Policy Gradient Optimization
Experiments
Experiment Setting
Quantitative Evaluation
...and 9 more sections

Figures (7)

Figure 1: Examples generated by MoMask and Ours. Our method significantly outperforms the previous state-of-the-art MoMask in text adherence, motion quality and human preferences.
Figure 2: The overall pipeline of MotionRL. Given a text input, the Transformer serves as a motion generator, first producing multiple motions as a batch. Various rewards are then computed for these motions. Within this batch of motions, the Pareto set is identified. Finally, using the rewards from the Pareto set, along with the outputs of the critic model and the prediction logits, the motion generator is optimized using the PPO algorithm (note that the critic model is omitted in the diagram).
Figure 3: Human Preferences Evaluation. (a) Perceptual scores on the test set using the pretrained perception model from motioncritic. The results show that our method aligns more closely with human perception compared to other approaches. (b) Comparison of human evaluations between our method and others. The results demonstrate that our method generates motions that are more consistent with human preferences.
Figure 4: Qualitative comparisons with top-performing methods. Our MotionRL exhibits better motion generation quality.
Figure 5: Impact of Pareto Selection and Reward-Specific Tokens. It illustrates the effectiveness of our proposed Pareto selection in enhancing the model's overall reward value. It also demonstrates how using different reward-specific tokens allows for trade-offs between various optimization goals, improving the balance between motion quality, text adherence, and human preferences in the generated outputs.
...and 2 more figures

MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning

TL;DR

Abstract

MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)