McSc: Motion-Corrective Preference Alignment for Video Generation with Self-Critic Hierarchical Reasoning

Qiushi Yang; Yingjie Chen; Yuan Yao; Yifang Men; Huaizhuo Liu; Miaomiao Cui

McSc: Motion-Corrective Preference Alignment for Video Generation with Self-Critic Hierarchical Reasoning

Qiushi Yang, Yingjie Chen, Yuan Yao, Yifang Men, Huaizhuo Liu, Miaomiao Cui

TL;DR

The paper tackles the challenge of aligning text-to-video generation with human preferences, which are inherently multi-dimensional and subjective.It introduces McSc, a three-stage reinforcement learning framework comprising ScDR for per-dimension reasoning, HCR for holistic comparison, and McDPO for motion-aware preference optimization.A self-critic reward model and hierarchical reasoning are trained to mimic human decision logic, while a motion-corrective weighting scheme mitigates bias towards low-motion content during alignment.Empirical results show state-of-the-art preference alignment and higher-motion video outputs across benchmarks, demonstrating the method's effectiveness and potential impact on practical T2V systems.

Abstract

Text-to-video (T2V) generation has achieved remarkable progress in producing high-quality videos aligned with textual prompts. However, aligning synthesized videos with nuanced human preference remains challenging due to the subjective and multifaceted nature of human judgment. Existing video preference alignment methods rely on costly human annotations or utilize proxy metrics to predict preference, which lacks the understanding of human preference logic. Moreover, they usually directly align T2V models with the overall preference distribution, ignoring potential conflict dimensions like motion dynamics and visual quality, which may bias models towards low-motion content. To address these issues, we present Motion-corrective alignment with Self-critic hierarchical Reasoning (McSc), a three-stage reinforcement learning framework for robust preference modeling and alignment. Firstly, Self-critic Dimensional Reasoning (ScDR) trains a generative reward model (RM) to decompose preferences into per-dimension assessments, using self-critic reasoning chains for reliable learning. Secondly, to achieve holistic video comparison, we introduce Hierarchical Comparative Reasoning (HCR) for structural multi-dimensional reasoning with hierarchical reward supervision. Finally, using RM-preferred videos, we propose Motion-corrective Direct Preference Optimization (McDPO) to optimize T2V models, while dynamically re-weighting alignment objective to mitigate bias towards low-motion content. Experiments show that McSc achieves superior performance in human preference alignment and generates videos with high-motion dynamic.

McSc: Motion-Corrective Preference Alignment for Video Generation with Self-Critic Hierarchical Reasoning

TL;DR

Abstract

McSc: Motion-Corrective Preference Alignment for Video Generation with Self-Critic Hierarchical Reasoning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)